Data management that speeds—rather than impedes—advanced DNA sequencing applications
How labs choose to manage their data may very well determine which have the most success in applying DNA sequencing to the development of advanced molecular diagnostics and therapeutics.
In the post-Human Genome Project era, molecular diagnosticsand therapeutics, which promise to significantly improve patient outcomes bylinking genetic diagnoses to targeted therapies, have been hailed as the nextgreat advancement in human healthcare. Gene sequencing technology is widely acknowledgedas the primary driver for these advances, powering both the research needed tounderstand and leverage individual genome maps and the clinical testing thatwill enable physicians to diagnose and treat disease based on a patient'sgenomic profile.
In this brave new world, it's hard to conceive thatsomething as mundane as the management of sequencing data could determine whichtechnologies succeed or fail. Yet today, sample management is the primarybottleneck to sequencing workflows. In May 2010, a survey of laboratorydirectors conducted by J.P. Morgan cited data management as one of the biggesthurdles to expanding next-generation sequencing (NGS). And in January 2011, asurvey by William Blair & Co. cited data management software support as atop priority in choosing an NGS platform, trailing only instrument throughputand reagent cost.
The ability to positively identify samples and maintain dataintegrity throughout the sequencing workflow will only become more vital asresearch labs aim to maximize throughput and extend research capabilities intoclinical applications. If information management systems are to support thecontinued evolution of sequencing and its application to medical diagnosticsand therapeutics, they must support the de-factostandards and best practices associated with using sequencers—the technologywithout which there would be no evolution.
But sequencing isn't just about machines. Instrumentscouldn't run and work wouldn't get done without people. Informatics, in the formof laboratory information management software, can help improve lab efficiency,ensure that labs deliver better quality results to customers faster and enforcepertinent clinical and U.S. Food and Drug Administration (FDA) regulatoryrequirements such as CLIA, CAP/ISO 15189 and 21 CFR Part 11. But informaticscan also inhibit lab staff by forcing them into unnatural or illogical ways ofworking.
Informatics for modern sequencing labs must thereforeinteract effectively with three very different constituents: the instrumentsthat perform the sequencing, the lab technicians who run the instruments andthe lab directors who are ultimately responsible for the output and quality ofthe lab's work.
Let sequencers leadthe way
Sequencers are the core technology driving next-generationgenomics research. The three major manufacturers of sequencing instrumentation(Illumina, Life Technologies and Roche) are all developing instruments capableof producing hundreds of gigabases of sequencing data per run. It's thereforeimperative that data management software keep pace. Software can effectivelyintegrate with sequencers in four ways.
1) Conformto the wet-lab protocols provided by a vendor. The continued raising of thethroughput bar has led instrument manufacturers to standardize the wet-labprocesses that will work best with their system into various sample-prep kits.Preconfigured informatics workflows that map to these vendor-specifiedprocedures enable labs to thoroughly track sample-preparation activities.
2) Ensurethat samples are properly prepared to run on designated instruments.Instrumentation vendors have developed specific standards for the media ontowhich libraries are loaded for sequencing. Preconfigured informatics workflowscan speed sample preparation by automating routine tasks such as tracking andloading concentrations, calculating dilution of libraries to normalizedconcentrations and tracking reagents.
3) Demystifydemultiplexing. Multiplexing, or library pooling, can increase samplethroughput. But two things limit the technique's utility: Scientists must beable to rapidly organize prepared libraries or samples that can be effectivelypooled together, while also tracking what happens to individual samples in apool before, during and after multiplexing. Preconfigured informatics workflowscan track a range of information associated with pending samples thattechnicians can search to build pooled libraries. Preconfigured informaticsworkflows can also support the assignment of adapters, indexes or DNA barcodesto individual samples in a pool to keep them distinct and trackable whenmultiplexed.
4) Run,monitor and track sequencing runs. Informatics can automate and track a varietyof tasks to make sequencing more efficient, such as matching items sent tosequencers with samples in the data management systems; generating necessaryfiles (such as run definition files and sample sheets) to communicate with thesequencer before and after sequencing; monitoring run status directly acrossmultiple instruments; capturing key run parameter files and primary analysismetrics; and automating demultiplexing and conversion of raw data files fromthe sequencer into FASTQ format for analysis.
Help lab techs workbetter and faster
Laboratory technicians, who interact most closely withinstruments on a daily basis, will appreciate tight integration betweeninstrumentation and informatics—yet such integration isn't all that techniciansrequire from modern sequencing data management systems. Sequencing work is fastpaced and dynamic—labs can generate hundreds of gigabases a day, and workflowsmay change monthly to accommodate new protocols and instrumentation. In thisenvironment, labs succeed by pushing the boundaries of innovation—and theycannot afford to be constrained in their vision by the software they implementto manage data and workflows.
Lab technicians are most interested in ways to optimizetheir personal and team efficiency while minimizing the amount of time theyneed to spend on routine, repetitive tasks. Most technicians want to spend aslittle time as possible recording information; instead of telling a systemthey've done something, they'd prefer the system anticipate the task and supplyas much information as possible to complement work they plan to do.
Technicians also need fast and easy ways to track their workand the work going on around them. Dashboard views offer an ideal way fortechnicians to review experiments in progress, guide samples effectivelythrough complicated workflows and collect and organize samples into multiplexedexperiments to achieve greater efficiency. No matter how the interface isdesigned, technicians require uncluttered and streamlined access to only theinformation they need to initiate experiments, find samples on which to work,monitor work in progress and stay informed about other work occurring in theirlabs.
Empower lab directorsto improve lab efficiency
The overall operation and administration of labs falls onlab directors. This means that unlike lab technicians, who need ways tostreamline day-to-day tasks associated with preparing and managing samples andrunning projects, lab directors require high-level views that they can use totrack lab progress and verify that work is occurring and being recorded promptly,accurately, proficiently and in compliance with applicable regulatoryrequirements for clinical research and biopharmaceutical applications.
Most lab directors rightly put their primary emphasis ondelivering high-quality results to clients and collaborators quickly. Datamanagement software can centralize up-to-date information on runs so that labdirectors can compare sequencing performance and trend accumulated data overtime. When data from multiple runs are archived and searchable, labs can make better,more informed decisions about which samples to rework, whether to request moresamples for further experimentation or how much time to spend on furtheranalysis.
Regulations will become more of a factor for labs thatundertake clinical applications for sequencing. Three regulatory requirementspotentially impact clinical genomics labs in the United States:
- CLIA: Codified in the Code of Federal Regulations (CFR); Title 42, Public Health; Part 493, Laboratory Requirements: The Clinical Laboratory Improvement Amendments (CLIA) regulate all laboratory testing (except research) performed on humans in the United States by ensuring the accuracy, reliability and timeliness of patient test results, regardless of where the test was performed. Practically, data management software can support CLIA compliance by positively identifying and maintaining the integrity of samples from the time of receipt through the completion of testing and reporting of results.
- CAP/ISO 15189: The College of American Pathologists (CAP) offers a lab accreditation program based on the International Organization for Standardization (ISO) 15189:2007, which utilizes specific lab accreditation criteria, procedures and processes to determine laboratory technical competence. Both programs focus on the continuum of care directly connected with improved patient safety and risk reduction and outline standards for quality and competence particular to medical laboratories in developing their quality management systems and assessing their own competence. To support CAP/ISO 15189, software must control the authorization and authentication of personnel that access sample and test data and the integrity of sample and test data (including its creation, modification, maintenance and transmission). It must also maintain audit trails that enable labs to identify individuals who have entered or modified data, files or programs and document modifications in a time-sequenced, trackable manner.
- 21 CFR part 11: The FDA codified regulations in the Code of Federal Regulations (CFR); Title 21, Food and Drugs; Part 11, Electronic Records; Electronic Signatures (ERES) that provide criteria for acceptance by FDA, under certain circumstances, of electronic records, electronic signatures and handwritten signatures executed to electronic records as equivalent to paper records and handwritten signatures executed on paper. While 21 CFR Part 11 only applies to FDA-regulated processes and submissions (and not, consequently, to clinical work), many organizations have adopted the regulations as a de-facto standard for managing any electronic records.
The data volumes produced by modern sequencing applicationsrequire new approaches to data management that center on the workflowsprescribed by sequencers and the specific needs of two different types ofusers: lab technicians and lab directors. From my perspective, how labs chooseto manage their data may very well determine which have the most success inapplying DNA sequencing to the development of advanced molecular diagnosticsand therapeutics.
Bruce Pharr is vicepresident of products and marketing at GenoLogics Life Sciences Software. Hehas more than 25 years of experience in technology product design, managementand marketing, including corporate and consulting roles with life scienceR&D hardware and software, pharmaceutical and medical devicecompanies. He holds a B.S. degree ineconomics and business administration, and he has completed executive programsin strategic marketing management and marketing strategy for technology-basedcompanies at the Stanford Graduate School of Business.