HOUSTON—The Human Genome Sequencing Center (HGSC) at Baylor College of Medicine is one of five institutions participating in the global Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium, which aims to better understand how human genetics contributes to heart disease and aging. The scope of that goal is massive, though, involving more than 300 researchers across five institutions around the world analyzing the genome sequence data of more than 14,000 individuals (3,751 whole genomes and 10,771 exomes), requiring approximately 2.4 million core-hours of computational time and some 860 terabytes of storage.
“Given the huge network of collaborators around the world, the immense number of patients and the need for security and compliance, the Baylor researchers realized they couldn’t do it all on their own computing cluster,” explains Andreas Sundquist, co-founder and chief technical officer of DNAnexus, a Mountain View, Calif., company attempting to power the genomics revolution using an enterprise-level solution that combines cloud computing with advanced bioinformatics.
Thus, a collaboration was born in which the HGSC has adopted the DNAnexus enterprise cloud platform to power its Mercury pipeline, a semi-automated and modular set of tools for the analysis of next-generation sequencing data in both research and clinical contexts. The collaborators also worked with Amazon Web Services (AWS), which provided architectural guidance to ensure the project’s operational success.
According to Dr. Andrew Carroll, DNAnexus’ lead scientist on the CHARGE Consortium, “Baylor recognized that it could do things with DNAnexus that could not be done with other programs. People could share data within the cloud, giving large collaborations access to huge amounts of data. It combined biology with bioinformatics expertise in a secure environment compliant with all the necessary regulations.”
“Being able to share data with collaborators efficiently, quickly and securely is extremely important,” explains Dr. Narayanan Veeraraghavan, lead programmer scientist at HGSC. “Since the sizes of our data sets are often in the order of terabytes, conventional methods, including slapping data onto a disk and mailing them, are not very appealing. The DNAnexus platform, on the other hand, fulfills our requirements of efficiency, speed and security very well. We’re stretching the boundaries of science.”
Baylor’s HGSC has played an important role in the emergence of genomics as a core discipline in modern biomedical and translational research, according to Veeraraghavan. “It has been at the forefront of technical innovation and deployment of next-generation sequencing technologies and is a leader in developing large-scale sequencing and analysis solutions,” he adds. “As the wet biochemistry lab was scaling up, and with the increase in analytics, the need to expand informatics capabilities was gaining more weight.”
DNAnexus offers a platform that enables clinical and research enterprises to efficiently move their analysis pipelines into the cloud, using their own algorithms alongside industry-recognized tools and reference resources to create customized workflows in a secure, cost-effective and compliant environment, according to Sundquist. Labs of any size can build and run their data analysis applications and workflows from anywhere in the world, and work securely with research and clinical collaborators.
DNAnexus has done “heavy computational lifting,” Carroll says. “The last stage will find numerous investigators slicing and dicing the data. Networks of hundreds of collaborators can work on the same project and plug in data for thousands of patient samples when leveraging this technology.”
“This project is the archetype for many other commercial collaborations,” says Sundquist. “The commercial potential is applicable to other organizations that analyze data and commercialize drugs. There can be partnerships with all sorts of organizations that do clinical testing and genome analysis.”
Genomic results from the HGSC will lead to new drugs and increase the predictive power and identify targets, according to Carroll. The end goal is to understand the science behind the consortium and knowing what applications are possible.
Both sides of the collaboration anticipate a long-term arrangement.
“Next-gen sequencing has gained center stage. That translates to larger volumes of extremely demanding computing requirements,” Veeraraghavan says. “The ability to collaboratively analyze almost unlimited samples from extremely large-scale study populations will enable HGSC to foster identification of drug targets and discovery of genes responsible for diseases, taking science and clinical diagnostics to the next level.”