The conventional approaches used to discover biomarkers that can be used to help us understand disease and assay for its presence or response to drug treatment represent a wonderful start. They involve looking at differential biology at the protein level and trying to see if there are changes, racing to sequence all of the changes, and then going back to the literature or reconciling with one's own hypotheses to choose which of the changed proteins are really relevant to the clinical question you are asking. Whatever the clinical question, whether it is detection of disease, determining the likelihood of response or an adverse reaction to a drug, you have to choose the right biomarkers and then build reproducible, precise assays.
Making that choice is where the difficulty lies, because it is a choice based on other information that you already have and therefore, assumptions you make. You have genomics information, literature information, your experiences in the field, and you make choices as to whether a given protein or set of proteins are somehow involved in the disease process and therefore useful to you. You then build assays so that you can validate whether that particular protein resolves that issue and leads you to a useful clinical or research assay. The ineffectiveness of this approach is pretty clear when you take a look at the clinical diagnostics industry over the past 10 years, and more specifically at the small number of new tests or assays that have been introduced that have shown significant commercial value.
The other source of the problem may be the question of whether single markers can provide high predictive value assays with high clinical utility. The answer is probably not, and certainly many diagnostic and pharmaceutical companies are coming to that realization. Our scientific intuition has always told us that biological pathways that lead to disease are complex, and that our labels for specific diseases actually describe heterogeneous processes, and consequently, looking simultaneously at multiple biomarkers will be required to achieve more robust, accurate, and predictive assays.
Conventional proteomics approaches seek to sequence and identify every expressed protein in a given disease phenotype. Then the choice that has to be made is which number of candidate markers do we gamble on and invest resources to make assays for so that we can perform validation studies. You immediately realize that you now have an N-factorial assay development and validation problem on your hands to test all of the proteins in different combinations—an impracticable task. So the real solution to this dilemma is what I might describe as "proteomic systems biology", but perhaps coming at it in a direction different from what we typically associate with systems biology.
Conventionally, we view systems biology as the integration of information into discrete biological pathways. By sequencing all the candidate biomarkers, or by analyzing gene array data, we can fill in the gaps of these biological pathways and we can draw arrows that connect the constituents of these biological pathways. Certainly, this is a valuable exercise, but it is time consuming even to run a small number of samples, let alone enough samples to account for variances due to biology as well as to analytical techniques.
An alternative approach, which we have taken at Ciphergen, is to analyze a lot of samples (10s to 100s) – as many as you can find from a diverse set of clinical and analytical backgrounds. After collecting the data, you use diverse bio-statistical approaches to let the system itself speak to you, and thereby determine a subset of four, five, or whatever number of markers that, when used together, allow you to detect ovarian cancer earlier or to stratify drug responders from non-responders. The goal here is not to sequence and identify every difference that exists, but first to understand which biomarkers are stable across these diverse clinical and analytical backgrounds. Conceptually, we have built validation into the discovery process, but have postponed the one-at-a-time assay development step.
What we've done then is let the biology present the key proteins and pathway networks that are dominating the phenotype. To accomplish this, you need a platform that has very high reproducibility and adequate throughput to analyze a statistically meaningful number of samples across a spectrum of clinical and pre-analytical phenotypes or parameters; or else you won't be able to draw adequate conclusions because you'll end up looking at the analytical and pre-analytical process variability and not the important biological variability. You must also have a platform that looks at proteins in their native state, rather than artificially digested, which can destroy valuable information and create poor process reproducibility (due to digestion non-reproducibility). Finally, you also need a platform that provides the separating power required to start looking at different regions of the vast dynamic range of the human proteome.
Having learned these lessons ourselves (sometimes painfully!), we have integrated our ProteinChip platform, combined with Biomarker Pattern software, into an approach we call Pattern Track. Once you can reproducibly look at lots of samples without digestion, you can then apply various forms of pattern recognition-based multivariate analysis to mine the data to find the optimal number and combination of biomarkers to answer the clinical question. Having run enough samples, you have adequate statistical support to draw valid conclusions, and can also associate specific biomarkers with clinical subgroups and even to pre-analytical factors such as how quickly a sample is frozen. It's not until this step that we undertake the process of actually identifying the relevant biomarkers.
Some people assume that pattern recognition-based proteomics forfeits the need to identify the biomarkers; we strongly disagree. In fact, this is a critical step in the entire Pattern Track process, because the identities of the markers form the basis of the next steps of biological validation and assay development. When you sequence these validated biomarkers, you may see that they come from one or more pathways, which provides a novel hypothesis of the mechanism associated with the phenotype—delivered by the process, not the scientist! You can proceed at that point to study those four or five proteins more rigorously with all kinds of protein interaction techniques and functional strategies. In this way, you've found potentially validatable, novel mechanistic pathways, validated biomarkers and also produced a validated multi-marker assay that can be used on a routine basis.
The proof of this potential is described in a recent paper (Fung et al. Intl. J. Canc. 2005, 115, 783) involving a phenomenon we call the Host-Response Protein Amplification Cascade (HRPAC). We found that when we used our Pattern Track process on different types of cancer samples, we kept getting common polypeptides arising from inflammatory acute response proteins. In serum, there are 20 or so high-abundance proteins that are associated with inflammatory response, which were of limited use because until now, it has been thought that inflammatory response is a non-specific, general phenomenon. When you have a disease, inflammation generally occurs, so the lack of disease specificity is not useful except to say that you have inflammation.
We discovered that three of these common proteins or their fragments produced high predictive value marker sets for detection of stage I/II ovarian cancer (Zhang et al. Canc. Res. 2004, 64, 5882). As we applied Pattern Track to prostate cancer to breast cancer to Chagas to Alzheimer's studies, we found that subsets of these inflammatory proteins or their fragments were producing high predictive value multi-marker assays. But in each disease, there were different combinations and even different fragments within combinations which produced the high predictive value assay. They were all disease specific and we began to realize and then hypothesize that these fragments were coming from tumor enzymes that were disease specific.
We surmised that when these common, high-concentration serum proteins were exposed to enzymes present locally at the disease site, the enzyme would act on these proteins to produce fragments or other modifications. Each disease elicits a specific inflammatory cascade that is reflected in the specific combination of biomarkers and post-translational modifications. This hypothesis came to us because we looked at the experimental results using this bio-statistical approach to biomarker discovery. We discovered a new pathway hypothesis using this "proteomics system biology" approach, where we listen to what the system is telling us to generate new mechanistic hypothesis, not easily imaginable by conventional proteomic approaches.
I think that this represents a really significant new way of thinking about this whole area of biomarker proteomics. We certainly haven't abandoned hypothesis-driven biology, and certainly we are developing methods to analyze the less abundant proteins. We are identifying combinations of biomarkers that have been validated in multi-site studies and are moving them forward into diagnostic test development with our commercial partner Quest Diagnostics. We are also working with major pharmaceutical partners such as Bayer to use this approach to develop assays for drug toxicity and clinical trial response stratification. If we ever hope to more rapidly translate biomarkers to speed drug development and improve patient care, we need to incorporate these kinds of approaches. Then we can make significant advances in translational proteomics and rapidly take "biomarkers to the bedside and to drugside."
Dr. William E. Rich has been CEO & President of Ciphergen since it founding in 1994, after holding the position of Senior VP at Sepracor from 1991 to 1994 and CEO & President of BioSepra, Inc. which was spun-out from Sepracor. Previously, he was a co-founder and Senior VP of Dionex Corporation.