The most powerful advance in genomics of the last five yearshas taken place in DNA sequencing technology. Capillary and gel-based Sangersequencing has been superseded by new massively parallel approaches that offerseveral orders of magnitude greater sequencing capacity, at >500-foldreduction in cost per base. Commercial next-generation sequencing (NGS)providers are heading towards the goal of a $1,000 human genome at a remarkablepace.
For the molecular biologist, this genomics technology hasbeen a great boon. Transcriptome sequencing provides a snapshot of cellularnucleic acids with unparalleled depth and precision, DNA-encoded libraries ofchemical targets enhance the power of selection experiments, and resequencingof target loci provides high-resolution maps of genomic variation within largecohorts of patients. The question for researchers has really become, "what elsecan be done with this powerful new tool?"
Fewer false positives, more reliable quantification andlower cost
The principle advantage offered by NGS technologies issingle-base resolution of nucleic acid targets. If a nucleic acid sequence isreturned, then the presence of this molecule within the sample is almost beyonddoubt.
By comparison, identification of PCR products has previouslybeen primitive, limited to the sizing of amplicons using DNA-binding dyes,and/or assumptions as to the specificity of PCR primer binding being sufficientto identify the correct product (only the 3' 6-8 nucleotides of a primer areabsolutely required for polymerase recruitment and initiation).
In the case of TaqMan real-time PCR, the recruitment anddisplacement of fluorescently conjugated oligonucleotide is employed as asurrogate measure of correct target amplification, but the existence ofpseudogenes (for instance, there are an estimated 52 GAPDH and 15 ACTBprocessed pseudogenes, in the human genome1) greatly complicatesspecific primer design. Ultimately, the only guarantee of amplicon identity isto sequence the PCR product.
False positives are target-dependent and variable betweenexperimental conditions and reagents, in some scenarios as high as 1 percent,even for U.S. Food and Drug Administration-approved diagnostic tests. Mostresearchers do not know the false positive rates of primers used for everydayassays such as mouse strain genotyping, yet mistyping errors can cost manymonths of research time. Use of PCR end-points for clinical trials places theoutcome of a significant expenditure upon PCR amplicons that may have beensequenced once in validation, but never again. The benefits of sequence-levelresolution for absolute certainty are clear.
Specific quantification of nucleic acid species anddetection of variants such as differentially spliced isoforms is most readilyachieved by sequencing, with comparison to a calibration curve of knownspike-in controls. A particular advantage of sequence resolution is the abilityto generate highly specific control reagents that mimic the behavior of theendogenous target, yet can be distinguished for quantification. R2 value ofsequencing experiment replicates are usually superior to real-time PCRquantification2, and the parallel survey of many amplicons generatedfrom a target sample allows normalization of variance due to individual primercharacteristics relied upon by real-time PCR.
Single-nucleotide resolution enables multiplexedidentification of thousands of individual nucleic acid molecules within asingle sample. An important utility of this property is sample multiplexing,the ability to combine PCR amplicons from many experiments if appropriatelylabeled with a nucleic acid "barcode" that allows the data to be segregatedback into individual samples post-sequencing. Combining many samples into asingle run reduces costs. The resulting cost of sequencing (~$1 per sample)compare favorably with the price of Syber Gold and other fluorescence-baseddetection systems, yet have significantly improved data quality.
Diversification in sequencing market enables lower-costbenchtop sequencers
To date, the sequencing market has been dominated by corefacility-based, second-generation sequencing machines that offer generousvolumes of data at a price point that limits regular use, except in thegenomics space. This year has signaled the emergence of scaled-downsecond-generation technologies that offer reduced bandwidth sequencing usingbenchtop machines with lower per-run costs, and the promise of third-generationtechnologies from several companies that can reduce run time to under one hourand cost less than $100.
A role for sequencing in pharmaceutical R&D
NGS has raised the interest of both bench industryresearchers and board members, in part evidenced by the strategic agreementbetween Biomerieux and sequencing provider Knome. A key question, however, is,"what commercial advantage can sequencing experiments offer?"
From a genomics perspective, having sequenced the humangenome, the next challenge is to understand the variation that underliesindividual susceptibility to disease, which necessitates targeted resequencingof key genes and expression analysis of these genes in affected individuals. Anattractive advantage, for instance, of RNA sequencing is the ability to measurethe level of SNP expression for many genes without a prior knowledge of theSNPs present in the sample. Sequencing can provide quantitative snapshots ofthe whole transcriptome, or for a lower-cost, focused analysis of selectedtarget genes across many samples.
In the microbiology lab, ribosomal DNA sequencing hasprovided the opportunity to study the diversity of bacterial communities inunprecedented detail, yet ribosomal sequence divergence captures only a smallfraction of the genomic diversity of bacteria, and their responses to environmentalstress and drug treatment. "Selective sequencing" approaches that capture manyvariable regions of the bacterial genome allow this additional diversity to beanalyzed, and offer new insight into how microbes evade antibiotics and developresistance. An advantage of NGS is the capacity to split a sample into two, andassay DNA variant detection and variation in RNA level expression using thesame sequencing assay.
The diversity of viral genomes presents a prime case for theuse of sequencing approaches to catalogue natural variation that illuminatesviral properties and identify emergence of new mutations under selectivepressure. Quantitative sequencing that enables researchers to track previouslyunknown sequence variants during viral infection demonstrates a particularadvantage of NGS over primer anchored real-time PCR approaches.
A concern for microbial sequencing experiments is how todeplete background-contaminating DNA such as human genomic DNA in stool orblood samples. Several approaches now exist to enable selective sequencing thatenriches for target nucleic acids, ensuring that the majority of sequencinginformation corresponds to the target pathogen genome, rather than backgroundnucleic acid. Such approaches will be critical for benchtop second- andthird-generation sequencing platforms in which sequencing bandwidth has beenreduced to lower the entry cost for small-scale users.
Integrating sequencing into existing experimentalworkflows
Another important consideration for investors in NGSapproaches is, "how will the technology be used in the lab?"
The answer is multi-faceted and still evolving, but a numberof options are available. In the case of current high bandwidth sequencers,individual users may be less keen to buy or even rent machines given the paceof development of new technologies and the cost of training personnel to usethem. Service models and in-house core facilities predominate, both fullyoutsourced. Existing sample preparation protocols for NGS often requires labor-intensiveoptimization, and outsourcing of sample prep may be preferable.
However, combining an existing highly optimized experimentwith a new technology off-site or allowing outside researchers on-site bringsadditional risks and sources of experimental delay. There is therefore a demandfor simple-to-use assay kits that are customized for—and easily integratewith—existing experimental methods, allowing samples to be processed in-houseand sent out to the sequencing provider of choice.
Many of the drawbacks of the outsourced sequencing model areovercome by rapid-turnaround benchtop machines. These units are expected toinitially demonstrate their worth following up observations from larger scalesecond-generation sequencing experiments, in which sequence resolution is stillrequired, but the target landscape has been narrowed. Establishment of thesemachines in the lab will be aided, however, by the development of assayscustomized for these platforms and accessible to molecular biologists without specializedtraining in NGS. In this case, there is greater onus placed on assaymanufacturers to generate data that is readily interpretable for those notfamiliar with high-complexity sequencing datasets. Bioinformatics softwarealready offers some NGS capable functionality, but the flexibility of theexperimental applications of these platforms mean that there are as yet noconvincing one-stop solutions for sequencer-to-notebook data processing. In theshort term, it may be that researchers look to assay providers to providecustom software tools that plug-in to broader analysis platforms, offering aconvenient division of labor as solution to meet customer data delivery needs.
The critical determinant in uptake of NGS for mainstreampharmaceutical research will be utility—technologies that provide new andmeaningful insight with the data they return. Sequencing service providers mustwork hard, however, to convince industry research groups that sequencing assaysprovide new insights that advance their understanding of product biology,create opportunities to discover new avenues of exploitable biology or makesignificant savings in their existing research programs.
If NGS continues to meet expectations as it has in the pastfive years, benchtop sequencers and real-time PCR machines will be competingfor researchers' affections in the coming years.
Graeme Doran is chief scientific officer for PathogenicaInc. in Cambridge, Mass., which provides sequencing services and diagnosticassays using NGS technology. Doran received his Ph.D. from Oxford Universityand postdoctoral training at the Massachusetts Institute of Technology.
1. Zhang, et. al.,2003. "Millions of Years of Evolution Preserved: A Comprehensive Catalog of theProcessed Pseudogenes in the Human Genome." Genome Res., 13:2541-2558.
2. Git, et al., 2010."Systematic comparison of microarray profiling, real-time PCR, and NGStechnologies for measuring differential microRNA expression." RNA, 16: 991-1006.