Near the dramatic end of Stanley Kubrick’s epic film Spartacus (*spoiler alert*), the revolting slave army has been defeated and surrounded by the armies of Roman senator Marcus Licinius Crassus, awaiting annihilation. To their surprise, however, Crassus has another plan, sending a herald to the group of weary slaves.
“The terrible penalty of crucifixion has been set aside on the single condition that you identify the body or the living person of the slave called Spartacus,” the herald announces to the astonished slaves.
The music grows more dramatic, and Kirk Douglas slowly lowers his gaze to the ground. Tightening his already granite jaw, he rises to his feet, but before he can speak, fellow slave Antoninus (Tony Curtis) bellows, “I’m Spartacus.”
And then another slave yells, “I’m Spartacus.”
And another and another and another, until the entire defeated slave army stands as one to chant, “I’m Spartacus.”
This turn of events leaves Laurence Olivier’s Crassus with quite the dilemma, as he knows the offender is within the group, but all of them claim to be the root of the problem.
Meanwhile, in a clinic somewhere, an oncologist pours through the medical literature, trying to identify clues that might help her understand which of her patients might benefit from treatment with oxaliplatin.
Flipping open The Lancet, she is immediately met with a paper that suggests a correlation between a given six-gene mutation profile and good performance of patients on the drug. She jumps from her seat and is about to order whole-genome sequencing, when a copy of the New England Journal of Medicine catches her eye and a paper announcing an oxaliplatin-favorable single-gene mutation.
And then a British Medical Journal article announcing, “I’m an oxaliplatin prognostic marker.” And then Oncotarget and JCO and BCJ and JNCI, all bellowing, “I’m an oxaliplatin prognostic marker.”
This wealth of data leaves the oncologist with quite the dilemma, as she believes the optimal indicator is within the genome, but all of these biomarkers or biomarker panels claim to be the root of the issue.
Technological advances over the past decade have completely transformed the ability of researchers and clinicians to interrogate the human genome. Where once the landscape was held entirely by microarray and PCR platforms, however, these still-valuable technologies have been supplemented and, in some cases, passed over in favor of next-generation sequencing (NGS).
Where microarrays required some foreknowledge of the genes of interest—you could only identify the existence of the gene or transcript if it was on the chip—NGS opened a door to the identification of unforeseen changes within the genome, whether you were searching the entire genome (i.e., whole-genome sequencing) or only wished to highlight certain segments (e.g., whole-exome sequencing, transcriptome sequencing).
Given the breadth of alterations these technologies can identify, they are being looked to increasingly to help researchers identify individual biomarkers or biomarker panels that correlate with everything from diagnosing disease in a patient (e.g., existing, potential) to that patient’s prognosis (e.g., low risk, high risk) to how the patient will respond to a given treatment (e.g., efficacy, toxicity).
“Effective cancer patient management relies on specific and useful biomarkers that enable oncologists to diagnose, determine suitable therapeutics, continuously monitor response to therapy or monitor tumor progression,” said Frost industry analyst Winny Tan in a 2012 report. “Consequently, personalized medicine is becoming a mainstream concept, and the approaches to personalized medicine are increasingly understood by physicians and patients.”
Given the increasing demand for such biomarkers, companies have been pushing hard to get products into market. Last June, Epigenomics AG announced positive results of two validation studies performed on its Epi proColon test to detect colorectal cancer, based on the methylated Sept9 DNA biomarker found in blood plasma. Similarly, in December, Cancer Genetics Inc. announced the launch of its NGS panel for chronic lymphocytic leukemia, the Focus::CLL, which assesses seven clinically relevant genes.
Meanwhile, Roche has been expanding its NGS reach over the past year, first with the acquisition of Genia Technologies and its nanopore-based NGS platform last summer, and then by acquiring a majority interest in Foundation Medicine, as Foundation has been making significant inroads into the molecular characterization of various diseases within the clinical setting.
One of the challenges with this wealth of data and analytical processing, however, is that few biomarkers or biomarker signatures have become clinically useful in the form of a diagnostic, prognostic or theranostic test. For every methylated Sept9, HER2, KRAS and EGFR test, there are possibly tens of thousands of unvalidated mutations.
Much like the pharmaceutical pipeline that may start with 100,000 potential drug candidates but see only one or two commercial entities dribble out the other end, databases around the world are filling rapidly, yielding only the most meagre drip of commercial products.
“I really like the analogy to the drug discovery pipeline, because you have so many molecules going in and so few products coming out,” says Merrilyn Datta, chief commercial officer for tissue informatics firm Definiens.
“The key to unlocking this challenge is really getting more and more individuals with their mutations and their clinical outcome data into the database and then being able to crunch through that in a bit more of a Big Data way,” she suggests. “Without having the correlation to clinical outcomes and therapeutic response, it is going to be very difficult to say which of these mutations may be meaningful, or at least meaningful clinically in terms of how we can respond to them.”
Holly Hilton, director of biomarkers and translational sciences at PPD, is less inclined to buy the pharma pipeline analogy, however.
“Yes, there are changes in the genetic code, natural variations, mutations of no consequence, but I think there is a lot of information there and just mining it, it is a little bit different than drug discovery,” she argues, suggesting that it is not just about looking for that one gem but rather that the information contained within these genomic profiles has its own value.
“I think that sometimes maybe what you’re thinking is an individual gene that ends up turning into a companion diagnostic, a real biomarker that is approved by the FDA, but those are going to be fewer and far between,” she continues.
“But the sum of the knowledge that we are gaining from using NGS, from understanding different types of disease and very subtly designing people’s root causes of disease, I think there’s a lot there,” she adds, acknowledging that often the lag time between generating all of this data and turning that data into knowledge can feel very frustrating.
Another challenge is that the data arising from these experiments can produce highly heterogeneous results, not only between research groups, but even within a patient population experiencing the same condition or within a single tumor.
Almost a year ago, Jennifer Wheler and colleagues at MD Anderson Cancer Center, University of California, San Diego and Foundation Medicine examined the genomic profiles of 57 women with metastatic breast cancer using NGS (see sidebar How’d they do that? below). Using an enrichment protocol that only examined 182 to 236 genes, the researchers found that each of the women had a completely unique genomic profile, and even if different mutations within a single gene were classified as identical, there were still 54 unique signatures among the 57 patients.
The researchers described most of the genetic aberrations they discovered as actionable targets whether with approved therapeutics or agents currently in clinical trials, but they raised questions as to how best to treat these “malignant snowflakes,” as it seemed clear that a one-size-fits-all solution would not make sense.
“We propose testing a new patient-centric, molecular matching strategy to find an optimal treatment regimen tailored to each patient’s genomic profile acquired from multiassay molecular testing,” the authors wrote. “Importantly, this approach would permit the therapy given to vary from individual to individual consistent with N-of-One customization.”
At the same time, one thing that makes cancer particularly difficult to treat is that tumors rapidly evolve over space and time within individual patients, and new mutations can move cells within a tumor in one direction or another in terms of resistance to treatment or degrees of malignancy.
This challenge was described by Marco Gerlinger of Cancer Research UK London Research Institute in a 2012 Illumina iCommunity Newsletter. Using NGS, Gerlinger and his group performed whole-exome sequencing of nephrectomy biopsies from four metastatic renal cell carcinoma patients, following up with ploidy, SNP array and mRNA expression analysis.
Within different tumor samples from a single patient, they discovered 128 different mutations, only about a third of which were ubiquitous across the patient, while a third were unique to specific regions.
“These data indicate that a single biopsy may not be representative of the mutational landscape of a solid tumor, which illustrates the problem we have at the moment with biomarker development,” Gerlinger said. “If a single biopsy is not representative and we perform multiple biopsies on a tumor specimen, the next question we have to answer will be which subclone has the dominant effect in terms of drug treatment outcomes? ”
“Spontaneous mutations occur in the tumor all the time, so you can imagine there are parts of the tumor that become more virulent or contain the exact gene mutation that you should be targeting, while other parts may be less so,” Datta echoes.
Highlighting this concern, when Gerlinger’s group performed RNA expression analysis of different regions of the primary tumor of the same patient, they found signatures that indicated both good and poor prognoses within the same tumor. Gerlinger suggested that sampling biases—not necessarily taking enough or the most important samples—may explain the challenge of validating biomarker signatures.
“The more precise the technology, the more care one must take with the sample,” opines Hilton. “If you only need a tiny little sample to sequence, you better make sure that the tiny sample is absolutely perfect and handled completely in a standard way across the whole trial.”
“The utmost care with these samples is pretty critical, not only through collection but through to storage, tracking, all these things,” adds Hilton’s colleague Frank Taddeo, PPD’s director of labs. “We’re trying to tease out very subtle differences in a sea or a very big background. Slight mishandling may lead to different end results.”
Both he and Hilton acknowledge that there is a world of difference between the discovery phase and actually moving into clinical trials where standardization becomes particularly key.
“We know that the underlying basics are so critical,” says Hilton. “The way the samples are collected, the timing of the selection, how are they handled; anybody doing a global clinical trial has to think these things through.”
“Even sampling the sample,” she continues. “What piece do you take? How do you handle it? It is so critical because the technologies can be beautiful, but the underlying samples must be perfect, must be taken with so much care.”
Datta draws a comparison with the early days of RNAi exploration.
“One of the things that got published early on by a group was the standards for doing an RNAi experiment,” she remembers. “I think we need to see something similar, first with the research area of biomarkers. Here are the types of tests and the types of statistics you need to do.”
Beyond the genes
In some cases, however, the seeming chaos of heterogeneous biomarker signatures is merely a mask for something deeper and that two disparate signatures may share a common root that correlates with disease. This is where a more integrated approach to data analysis and the wealth of bioinformatics tools can come to the fore.
Igor Jurisica, senior scientist at Toronto’s Princess Margaret Cancer Centre and leader of the Mapping Cancer Markers project, describes his group's efforts as integrative computational biology (see sidebar Ground control to major breakthrough? below). The idea is to take genomic signatures that they identify from different data sources and line them up alongside similar biological data to see if the signatures are connected in some way. These other data could be part of a protein interaction network, metabolic networks, transcriptomes or something else entirely.
“When we started, we were looking for a better algorithm for identifying these prognostic and predictive signatures,” he explains. “But then we realized that in order to solve that problem, we had to have a much better handle on all these networks: which microRNAs control what genes, what proteins interact with what other proteins, what are the important signalling cascades and so on.”
Datta describes similar efforts at Definiens as integromics, linking the individual genomic changes to tissue pathology or radiology information on patients. The challenge, as she describes it, was that medicine tended to be more qualitative, making it difficult to connect to other input.
“To really crunch data bioinformatically, each field has to have a certain type of data with certain qualifiers,” she argues. “Since most of the clinical data that we’ve collected hasn’t been collected in that way, it makes it really difficult to use it.”
With that in mind, Definiens Chief Technology Officer Gerd Binnig developed a system called Cognition Network Technology that enables researchers to teach an algorithm to examine any type of image and understand the biological and medical context of what it is viewing. It’s what Datta calls tissue datafication.
“You can take some of the pathologists’ wisdom about what different types of tissue features and morphology mean, you teach the context to the computer and it can extract all of the data—thousands and thousands of data points—and then bioinformatically crunch it, because it is no longer qualitative but is now quantitative,” she says.
According to Hilton, PPD relies on its data dashboard system as an interface through which any of its project partners can query the variety of data—genomic, clinical, site-specific, patient, etc.—from different angles, looking for insights.
“We have to have a variety of people who are experts getting the data together in one place,” she says, highlighting the multispecialty teamwork nature of this research. “And then you have to have dashboard software where people can query this data from multiple angles.”
For Jurisica, however, it isn’t just about developing powerful algorithms or having a robust informatics infrastructure in place; it is also about having confidence in your data sources.
As an example, he describes the Cancer Data Integration Portal, a project he initiated several years ago to compile data from a variety of repositories and determine for any given tumor type what data they could trust and was sufficiently annotated to be of use.
“It’s alarming how little information ends up being useful or how much you have to expend to extract something useful out of what is available freely,” he comments.
He notes the situation in lung cancer where the project collected data on more than 6,200 patient samples and yet quickly discovered that this represented only 4,500 unique patient samples, which plays havoc when testing any biomarker signatures you might identify.
“We train a model for the prediction of an outcome on one data set, and then we need to validate the model on an independent data set to estimate how well it will perform in future instances,” he says. “The critical aspect is that this should be independent data, because if there is any relationship to the original set then you skew your results and they look more optimistic.
“We have one specific example in ovarian cancer where you clearly see the problem because that signature performs incredibly well on one data set but extremely poorly on others, and then you realize that 30 patients in the data set were also used in the training set.”
As well, the data from these resources may be missing vital annotation, possibly because the point of the original research had little or nothing to do with how scientists are using the data now.
“With lung adenocarcinoma, for example, smoker vs. non-smoker is a completely different disease,” Jurisica explains. “So if you try to go by histological subtype and you don’t know that there is a subgroup of smokers vs. non- smokers, you are not going to find anything meaningful because you have an extremely heterogeneous group of tumors in that analysis.”
A 2012 biomarker report by Frost & Sullivan suggested that the commercial success of any diagnostic test will be predicated on its usefulness in the clinical setting and how well it predicts clinical outcomes.
“Test developers should anticipate greater demands for proof of both assay performance and clinical utility,” suggested Frost analyst Tan in announcing the report. “Private and government health plans will require clear evidence of the clinical utility and the impact of the test on patient health to compare it with the best alternative.”
This means that agencies like the U.S. Food and Drug Administration (FDA) are going to continue to push harder for clinical validation before they will be willing to license laboratory-developed tests (LDTs).
Datta picks up on the LDT connection, suggesting that the FDA’s direction here might be a strong signal as to what companies can expect to validate the clinical utility of their biomarkers or panels.
“I think some of the things they’re going to ask for in terms of the validation of the complex LDTs could be the lines in the sand for the standards that should be required for clinical validation,” she says. “I feel like there is an interplay between what the FDA is going to ask for in complex tests and what the standards of the validation should be.”
“If we are going to be making clinical decisions for patients based on a biomarker, a co-diagnostic, we need to be sure that the decision that we make is based on sound repeatable data,” says Hilton.
Such validation is going to require yet another patient population, however. And as the efforts of Jurisica and others have shown, it can be difficult enough to find clean resources on which to identify and then test a putative biomarker or signature without then having to find yet another resource on which to validate it.
Jurisica is also quick to highlight the economic questions around such validation efforts.
As he explains, pharma and biotech companies are likely to be interested in funding clinical studies of biomarker signatures that might indicate which patients will do well on an investigative or recently approved therapeutic, but what about those biomarkers or panels that do the same for off-patent therapies?
“Even if you have a potential solution or at least a direction to a solution, you might still not be able to get to the finish line because there is no funding mechanism to get you there,” he bemoans.
And even if we can get past that barrier, then the task becomes changing the way the industry approaches diagnostic testing.
“In diagnostics, historically, you want to get a test that is as simple as possible,” Datta explains. “You might find that there are 60 things that could be indicative, but really only five are the drivers, so you tend to push to the five or maybe the two things you can look at to make a diagnosis.”
That’s why Hilton suggests that single-biomarker tests will remain the “bread and butter” of diagnostics for quite a while yet.
“Multiple marker/multiple signature is still in its infancy,” Taddeo concurs. “There’s a wealth of data, and trying to integrate them all together, trying to understand it as a whole, we’re still I think a ways off before we have that full picture. So we have to fall back to what we know, some of these single gene-single target-single response kinds of things.”
“Over a 10-year period, we’ll start to change and think the more data, the better,” Datta continues the thought. “We’ll do massive panels. We’ll do whole-genome sequencing. But there has to be a change in mindset.”
But even as the industry tries to catch up to the current reality of what kinds of biomarkers are becoming available to them, all indications are that the volume of data spewing from the discovery efforts will continue to grow at a stunning rate.
Highlighting the complexity that can be found within even the smallest patient population, researchers at Foundation Medicine and several U.S. cancer centers used next-generation sequencing to elucidate genomic profiles of 57 women with metastatic breast cancer.
The researchers isolated DNA from formalin-fixed paraffin-embedded tumor samples, shearing the resulting DNA into 100- to 400-bp fragments using sonication. They then created a library of these fragments by repairing the molecule ends before adding the deoxyadenine (dA) sequence adaptors required for sequencing.
Rather than sequence all of the resulting fragments (whole-genome sequencing), the researchers focused their efforts on cancer-related genes and common cancer-related rearrangements, using biotinylated oligonucleotides to perform library enrichment via hybrid capture.
The library was then sequenced to an average median depth of 500-fold and mapped to the reference human genome using a variety of sequence-alignment and analytical resources.
Targeting just 182 to 236 genes, they determined that no two women in the population of 57 patients had the same genomic profile. And even when different variants of the same gene were considered identical, there were still 54 unique profiles.
“Despite the larger percentage of aberrations [noted in this study] that can be targeted by drugs that are already approved or in clinical trials, our data illustrates why current clinical trials and practice generally provide only short-lived tumor regressions,” the authors wrote. “In particular, the fact that most patients have multiple aberrations, and that the abnormalities differ from individual to individual, suggests that treating different patients with the same drug or drug combination may be insufficient to optimize success.”
With the help of a technology that first searched for signs of intelligent life in the universe, Igor Jurisica and his colleagues at Toronto’s Princess Margaret Cancer Centre—ironically, at the MaRS Discovery District—are looking for more intelligent molecular signs of human disease and response to treatment.
More specifically, he is accessing IBM’s distributed computing network—the World Computing Grid—to crunch data from a variety of sources to identify molecular signatures that could ultimately help diagnose a person’s risk of any number of cancers, predict the patient’s prognosis and possibly how he or she would respond to any of several thousand possible treatments.
But this is not Jurisica’s first ride on the Grid.
“The first project we worked on was protein crystallography,” Jurisica recounts, noting that a computational problem quickly arose from doing high-throughput crystallography screens and the millions of images that had to be analyzed.
“A human cannot manually analyze all of the images,” he laughs.
His sentiment is echoed by project collaborator Christian Cumbaa, who suggests, “Others have gone blind doing it.”
“We needed to extract morphological features from these images from which we can build a machine-learning type classifier to identify the result of the experiments,” Jurisica continues, but to do that was going to require vast computational resources to which they did not have access.
“Being pragmatic researchers, you don’t even start thinking about solving the problem, but you start thinking about what I can afford to do,” he notes. “And with what I can afford to do, can I attempt to solve the problem?”
At the time they were contemplating these questions, says Jurisica, the largest Linux cluster was in nearby Buffalo and had about 2,000 cores. But the math quickly dispelled any thoughts that even this would be enough. Each image required two hours of analysis, and they had more than 120 million crystallographic images.
“If we completely took over the Linux cluster in Buffalo, [the computation] would take about 180 or 200 years,” he smiles. “Obviously, the cluster will die before it’s done.” As would he and his group.
Then he met the folks from IBM and the World Computing Grid, which he says offered him effectively unlimited computing power. The same work that would have taken 200 years in Buffalo was completed in only 5.5 years on the Grid.
Starting in November 2013 on the Mapping Cancer Markers project, Jurisica’s team was given access to about a third of the Grid’s machines, which currently number around 2.9 million, allowing the group to process about 258 CPU-years per day.
“The World Community Grid frees you up from thinking 'what can I afford to do,'” Jurisica enthuses, “and rather, asking the question 'what would make sense, regardless of whether I can afford to compute it or not.'”
A 2013 Frost & Sullivan report on NGS informatics suggested that data analysis tools and related resources earned revenues of $170 million in 2012, with a projected market value of $580 million by 2018, for a compound annual growth rate (CAGR) of almost 23 percent.
“With sequencing data production forecast to grow at a CAGR of more than 75 percent between 2012 and 2018, researchers will need efficient NGS informatics solutions to manage, analyze and interpret this escalating amount of data,” argued Frost Senior Industry Analyst Christi Bird in describing the report. “As the number of applications for NGS continues to grow, the implementation of NGS informatics will go up.”
One of the challenges the report foresees in the market, however, is the wealth of free and open-source informatics resources available to NGS end-users, which may make researchers reluctant to expend dwindling research funding dollars on off-the-shelf products and services.
“As the market matures and informatics prices fall, competitors will rely not just on the expanding volume of users and NGS data developed, but also on the high-value interpretation sector of the market to sustain profits globally,” Bird continued. “Suppliers that can develop high-value, simple and streamlined turnkey solutions will find success.”
Discussing the biomarker landscape but expanding the discussion beyond solely NGS informatics, BCC Research suggested that the broader biomarker bioinformatics market could dramatically expand for the $2.3 billion it earned in 2013 to almost $5.2 billion by 2018, but this success may bring its own baggage.