Special Report on Proteomics: Profiles in prognostication
Protein biomarkers go in search of clinical validity
Flipping through an old family photo album, it can be interesting to consider what became of those young people—how they evolved from the children they were to the adults they are.
In some cases, there may be legitimate hints of the future in those old photographs. The cousin who was constantly singing and became a performer. The nephew who was always searching nearby streams or forests and became a biologist.
For most of those kids, however, there was no real sense of what they would become. Too many other factors over the subsequent years channeled the child toward their ultimate identities, factors that simply could not be captured in a single snapshot.
In human health and disease, much the same can be said for genomic markers.
Although a few DNA sequences strongly correlate with a disease state—CFTR’s ΔF508 mutation in cystic fibrosis, for example—more only weakly correlate with disease and can only offer the vaguest implications of what might arise in the future.
Epigenetic research has attempted to add a degree of dynamism to the genomic question, looking at changes that may influence a gene’s function over time or in response to shifting environmental conditions.
Alongside this effort, however, has been a renaissance of interest in proteins as molecular biomarkers of health and disease.
Although proteomic research has long facilitated our understanding of disease pathology and has helped define the vast heterogeneity within tissues, patients, populations and throughout disease evolution, the same methods are increasingly being used to translate that understanding into potential prognostic and diagnostic signatures.
Rather than a single photograph that’s open to speculation, we instead get a time-lapse that highlights the minute changes that lead to future outcomes.
Whereas DNA chemistry is largely the same regardless of sequence, however, the panoply of amino acids, tertiary folds and post-translational modifications that comprise protein chemistry offers a complexity and variety of molecules that can make analysis at scale difficult.
To make clinical translation of proteomic biomarkers viable, new methodologies are needed that will heighten specificity and sensitivity while at the same time providing scalable throughput and broad characterization.
Prepare for success
The molecular complexity of most biological samples heightens the need for effective sample preparation steps to pick up the often very low abundance biomarker signals. This exercise is less finding a needle in a haystack than finding a needle in a mountain of other needles.
“Effective sample preparation methods should be able to separate these rare protein biomarkers from those abundant proteins, such as albumin,” suggests Suparna Mundodi, Agilent Technologies’ director of marketing for clinical mass spectrometry.
She gives the example of FFPE blocks, which have been shown to hold rich clinical and phenotypic data and are therefore an ideal source for retrospective biomarker analysis; for example, following a clinical trial.
“Biomolecules are heavily cross-linked in FFPE tissues, requiring highly efficient sample preparation methods to extract proteins,” she notes. “However, a lack of standardized sample preparation methods for protein extraction from FFPE tissue makes it costly and challenging to use these for biomarker validation studies.”
Recognizing the challenge of FFPE tissues, Vall d’Hebron Institute of Oncology’s Paolo Nuciforo and colleagues recently used laser-capture microdissection (LCM) to isolate colorectal cancer (CRC) cells from tissue slices. They then used NantOmics’ Liquid Tissue methodology to reverse formalin-induced crosslinks to completely solubilize the cellular proteins for trypsin digestion and MS analysis.
In this case, the researchers performed targeted multiplex proteomic analysis, identifying and profiling a panel of proteins known to be involved in CRC pathogenesis.
“One of the major goals of our study was to explore whether multiplex proteomic analysis could complement current genomic molecular pre-screening by identifying abnormal proteins expression in patients without any targetable genomic alteration,” the authors explained. And indeed, they identified 29 outliers in 21 patients.
“Among these, 12 could be potentially used to guide the selection of an investigational antibody-drug conjugate treatment (GPNMB, MSLN and TROP2) or refine a chemotherapy strategy (TOPO1 and TOP2A),” Nuciforo and colleagues reported.
They also correlated expression of specific proteins with outcomes.
“High levels of MSLN were associated with worse [overall survival] independently of the methodology used,” the authors wrote. “Lack of PTEN protein expression significantly associated with a high risk of progression in the first-line anti-EGFR setting, with a time to progression of 4.2 months versus 9.4 months in patients whose tumours expressed PTEN.”
Beyond the individual findings, however, the researchers described the major impact of targeted multiplex proteomics as the potential to more precisely match patients to experimental therapies.
“Recruitment rates in early clinical trials based on genomic markers or targeted [immunohistochemistry] in metastatic CRC does not exceed 15 percent,” they explained. “We expect that proteomics-guided drug development will expand treatment options for patients who are eligible to participate in early-phase clinical studies, particularly in view of the increasing array of ADCs and immunotherapeutic approaches, thus repurposing proteomics as an important contender in precision oncology.”
Also working in colorectal cancer, University of Adelaide’s Chandra Kirana and colleagues used LCM to prepare their tissue samples, but combined it with two-dimensional difference gel electrophoresis (2D-DIGE).
According to the authors of that work, “Tissue heterogeneity between patients and presence of non-cancerous cells within tumors contributes significantly to the ‘noise’ and variability of samples [and] thereby strongly affect molecular profile of tumour cells. Laser microdissection allows populations of cells to be analyzed separately to minimize contamination from other cell types, thereby reducing variation and allowing more relevant and consistent comparison.”
For their part, they used 2D-DIGE to minimize gel variation and improve reproducibility, relying on saturation CyDye labeling to increase sensitivity to low-abundance proteins.
Profiling tissues from stage II CRC patients, the researchers looked for markers that distinguished between subjects whose cancer metastasized within five years or didn’t. Their results suggested that five proteins could serve as potential biomarkers of tumor spread, informing clinicians as to which patients might benefit from adjuvant chemotherapy.
These solid tissue examples should not, however, leave the impression that less-invasive biological samples (such as urine and blood) don’t pose unique challenges.
“Some of the challenges associated with urine are the presence of low concentration of protein and hence [the need for] good sample preparation methods to obtain the required amount of protein for biomarker analysis,” Mundodi offers. “In addition, the presence of small molecules could cause interference in the analysis step.”
At ASMS 2018, Paulos Chumala and colleagues from University of Saskatchewan and Agilent used MS-based proteomic analysis to categorize patients into asthma, asthma-like and control via protein biomarker profiles. Although they were able to identify biomarkers that showed a >2-fold change vs. controls, urine proved a challenging matrix, and they had to normalize total protein and creatinine concentrations when working with biological replicates.
The complexity of blood samples, Mundodi continues, could cause significant variability in protein concentration. To address these challenges, companies have devised a variety of solid- and liquid-phase extraction methods, as well as automated solutions such as Agilent’s AssayMAP Bravo to reduce sample prep variability.
Even if your sample prep manages to remove 100 percent of the high-abundance proteins, however, Stephen Williams, chief medical officer of SomaLogic, worries about what else may be leaving with those high-abundance markers.
“Those proteins are pretty sticky,” he cautions. “If you can stick them on a column, then yes, you are depleting albumin, but how many proteins stick to it and are in equilibrium with it in plasma?”
“You’re taking out a lot more than just the most abundant proteins, so you’re really changing the matrix,” he continues.
Rather than deplete the samples, SomaLogic’s SomaScan platform uses aptamers to bind protein biomarkers from whole plasma, Williams explains, having identified an aptamer to each of 5,000 plasma proteins.
Because the amount of a given fluorophore-tagged aptamer remaining during sample processing correlates with the amount of its protein binding partner, the SomaScan ultimately uses array technology to bind and quantify the fluorescent signal of each aptamer, rather than the protein directly.
Williams acknowledges, however, that high-abundance proteins were an issue for company founder Larry Gold in the early days of platform development.
“The first obstacle was that the aptamers didn’t bind tightly enough,” he recounts. “In plasma, which is the matrix of choice, the dynamic range of the most abundant to least abundant proteins is at least eight logs. So, no matter how great your affinity is for a single reagent, you can’t overcome eight logs of dynamic range with just affinity.”
Your aptamer could be 99.999 percent specific, he continues, but in the face of a protein that’s a billion times more abundant, you could end up with the wrong signal.
To some extent, they were able to improve the affinity by modifying the aptamers (now SOMAmers) with sidechains that mimic amino acids, but even that was insufficient to develop multiplex screening beyond 50 proteins.
Gold and colleagues then turned to another parameter of binding: kinetics.
“It turns out that you can select these aptamers not just for high affinity, but you can also independently select for slow off-rates,” Williams explains. “So, now, if you have a high-affinity and slow off-rate aptamer, you can use kinetics during the assay, because the non-specifically bound aptamers dissociate rapidly.”
And because the aptamers are polyanions, he waxes, adding a load of unlabeled polyanions into the solution helps prevent rebinding to the lower affinity but higher abundance proteins.
“That was what enabled us to now multiplex to 5,000 proteins all at once,” Williams enthuses.
By surveying all proteins in plasma at once, SomaScan allows researchers to focus on patterns of change from one sample to another in a protein-agnostic manner. Although the identity of the individual biomarkers may be important to better understand the underlying biology, it is not required to establish difference panels.
“For us, it doesn’t matter,” Williams explains. “We don’t actually have to explain the origin in the change in signal.”
The same may not be true for SomaLogic’s pharma partners, he acknowledges, who may be looking for novel drug targets and may very much wish to know the origin of the signal change. This is where Williams sees a role for MS.
“Having identified that some proteins are candidate targets, then you can do much smaller-scale mass spec to explain, in people who have a high or low SomaScan signal, what it is that’s really going on,” he adds.
Singlets vs. signatures
As with cancer genomics panels that now screen for a variety of markers as hallmarks of a disease, rather than a single mutation, so too is the proteomic biomarker landscape shifting from single-molecule indicators (e.g., PSMA) to multi-protein signatures.
“It is highly unlikely that a single protein biomarker will be specific enough to be a determinant of disease,” offers Agilent’s Mundodi. “Multiple protein biomarkers that show simultaneous and significant changes during disease will form a protein signature. These protein signatures will be carefully evaluated and validated to determine the disease status.”
But even here, she suggests, the signatures may be able to more finely characterize a disease state, giving more than a binary Yes/No diagnosis.
“Sets of protein biomarkers at different stages of disease will be unique and thus offer valuable information for early disease diagnosis,” she explains. “A deep understanding of the proteome and controlled large-scale proteomic studies are necessary to identify unique protein signatures specific to a disease state.”
“For an individual protein to work as a useful biomarker, it has to be unique to a health condition, because if it’s not unique, it’s going to pop up all over the place and give false positives,” he warns. “And that single protein has to explain a clinically meaningful proportion of the disease or health condition.”
He notes that there might be some single proteins downstream of serious genetic abnormalities that might explain and be indicative of a health condition, but these are the exception.
From Williams' perspective, however, the biggest challenge to a single biomarker assay is heterogeneity—within disease processes, patients or patient populations.
“If you want a measurement or prediction that is generalizable across a population of people that includes men and women, different ethnic groups, and everything, then your signature has to adjust for men, women, kidney function, ethnic diversity, etc.,” he says.
Last year, Max Planck Institute of Biochemistry’s Matthias Mann and colleagues described their efforts to identify protein signatures of non-alcoholic fatty liver disease (NAFLD) and cirrhosis in patients with normal glucose tolerance or type 2 diabetes (T2D). To accomplish this, they performed plasma proteome profiling using LC-MS/MS as well as a new acquisition method they called BoxCar, which expanded the dynamic range of peptide signals.
“Currently established protocols in clinical practice for the diagnosis and follow-up of NAFLD have certain limitations; for instance, they may not be sufficiently sensitive at early disease stages,” the authors explained. “MS-based proteomics technology holds great potential in generating novel insights into disease mechanism and discovering new biomarkers.”
With plasma from 48 subjects total, the researchers quantified an average 500 proteins per subject with signals over six orders of magnitude. Of these, they identified six proteins with significantly altered expression patterns in NAFLD and cirrhosis, two of which had already been implicated in liver disease.
Of particular interest was polymeric immunoglobulin receptor (PIGR), the elevation of which correlated with liver disease severity—170 percent in NAFLD and 298 percent in cirrhosis. The authors also noted that PIGR covaries with traditional clinical liver markers AST, ALT, ALP and GGT, suggesting its possible utility as a biomarker of liver damage.
“The plasma proteome changed much less in NAFLD than in cirrhosis, and globally the plasma proteome profiles had few significant outliers, both in the cohort with normal glucose tolerance and in the T2D cohort,” the authors observed. “This presumably reflects the resilience and regenerative capacity of the liver and is also in line with the fact that NAFLD or early cirrhosis is often asymptomatic and clinically difficult to detect.”
The authors plan to further study the potential and specificity of the biomarkers for use in a liver disease panel in a larger study with more fine-grained cohorts.
Moving away from MS entirely, Ruo-Pan Huang and colleagues from RayBiotech and Emory University decided to leverage what was already known about endometriosis pathology to identify potential biomarkers.
“Since endometriosis is thought to have an underlying inflammatory basis, we hypothesized that aberrant levels of cytokines or similar molecules could be detected in diseased patient plasma,” the authors explained. “Thus, we began our study by probing disease patient samples with one of our large multiplex arrays.”
Screening plasma from 70 subjects medically diagnosed with endometriosis and 52 subjects confirmed as disease-free, the researchers identified an initial 38 cytokines with different expression patterns between the two groups. Further analysis narrowed this list down to 14 biomarkers.
“When evaluating these 14 markers, we noted that they were not all associated with one common process or pathway, but instead spanned across multiple pathways from inflammation to angiogenesis to cellular growth factors,” Huang and colleagues wrote. “Such a finding supports a multifactorial disease etiology that may require a methodology to identify multiple rather than single biomarkers for disease detection.”
The researchers then tested for reproducibility and specificity by developing a customized Quantibody antibody array for the 14 biomarkers, comparing differential expression not just in an endometriosis vs. healthy validation set, but also with samples from polycystic ovary, pelvic adhesion and ovarian cyst patients. They wanted to determine whether any or all of the biomarkers were specific to endometriosis rather than more widely representative of inflammatory gynecological conditions.
Of the 14 biomarkers examined, seven appeared unique to endometriosis. Five of these had never previously been identified as biomarker candidates
“Our automatable and high-throughput methodology could allow for cheaper initial detection techniques, as well as reasons to explore potential non-invasive biomarkers of disease,” the authors suggested.
“Ideally, a non-invasive test could be built into normal blood workups during a patient’s annual checkup and help identify potentially at-risk patients suffering from abdominal related symptoms,” they added. “At the very least, such tests may be able to eliminate those patients for which invasive surgeries are not warranted based on biomarker workups.”
According to Williams, SomaLogic’s models tend to involve somewhere between 50 and 100 protein markers, a jump over what has come before.
“When you apply the machine learning at scale and your desire is for a generalizable product and your training set includes a diverse population, you allow it to find those other sources of variability that would normally be confounding,” he says. “You allow it to include adjustments for those things in the model.”
That’s the complete flip to the usual way of doing things, he offers, which is to control all of the variability, do a case-control study and find the pure physiological difference due to the target disease.
“Then,” Williams argues, “when you try to take that out into the real world, it just doesn’t work.”
“The nice thing about this agnostic machine learning is you don’t really care what the origin of the signal is, you just care about generalizability and performance,” he remarks. “If you train it on a large-scale population that represents your intended-use population, you allow the machine learning to not just find the heterogeneous disease pathways, but also you allow it to find other things that it might need to adjust for in a large population.”
Thus, heterogeneity is less a curse than the thing that might empower the signature in a real-world setting.
This begs the question, of course—just how well do these discovery-phase signatures work with real-world samples?
“The purpose of the bridge between discovery and clinical validation, the so-called verification phase, serves to confirm expression of the protein and prioritize the numerous biomarker candidates from the discovery phase to create a list of the most promising candidates for use in developing higher-throughput tools,” explained Yi-Ting Chen and colleagues from Chang Gung University in a recent review on genitourinary cancer biomarkers.
Looking specifically at MS-based discovery, they suggested that a first step following discovery is to verify expression profiles using a second quantification method. Although Western blotting or ELISAs have been the traditional methods for this, they noted the limitation that for novel markers, high-quality antibodies may not be available.
Thus, they pointed to MS modes such as multiple reaction monitoring (MRM) or parallel-reaction monitoring (PRM). The advantage of something like MRM over ELISA, they argued, was the ability to quantify multiple candidates in all samples in a single MS run.
The surviving biomarkers then have another hurdle to clear.
“[The clinical validation] phase aims to translate outcomes of preclinical end-point studies performed in an academic setting using state-of-the-art technology to clinics using tools that fulfill clinical requirements of an In Vitro Diagnostic Device test,” Chen and colleagues wrote.
But even in fulfilling these requirements, they continued, there can be many challenges in the gap between capability and utility: “Economic issues, including the cost of the biomarker test versus original medical assays, are calculated at this phase, as is the overall effectiveness of disease management in terms of quality of life, mortality and cost to the patients themselves, the community and the government.”
Aside from the technical aspects, it is in the assay practicability and generalizability where Williams sees an advantage for SomaScan and the company’s growing portfolio of SomaSignal tests.
In 2018, SomaLogic published their efforts to test their cardiovascular event prediction model using samples from Pfizer’s terminated ILLUMINATE clinical study of atorvastatin with or without torcetrapib, a cholesterol ester transfer protein inhibitor, in patients at high cardiovascular risk. The Pfizer study was terminated after 550 days because of increases in cardiovascular events and deaths in the torcetrapib arm.
Using stored plasma samples taken at baseline and three months, Williams and colleagues at SomaLogic, UCSF and Pfizer tested the model to see if they could retrospectively detect any signs of trouble.
“Within the first three months, we could see people getting worse on torcetrapib and better on atorvastatin [monotherapy],” Williams recalls. “The difference between the groups was roughly the same as the difference in events 18 months later. We could predict at three months if the drug was going to kill people.”
The researchers were surprised to find that torcetrapib altered the plasma concentrations of 200 of the 986 proteins measured. And when categorized by biological pathways, the treatment affected eight of the top 10 pathways related to inflammatory and immune functions, something not previously noted with torcetrapib.
“Our proteomic pathway analysis showed that torcetrapib in ILLUMINATE also had major endocrine effects, particularly on aldosterone and glycemic control,” the authors further noted. “Aldosterone levels had been measured in a post hoc exploratory analysis to explain the observed elevations in blood pressure, reductions in potassium, and elevation in bicarbonate among patients who received torcetrapib.”
In terms of benefits, where the original ILLUMINATE trial reported improved glycemic control with torcetrapib, this latest study opened mechanistic explanations by noting changes in nine proteins related to insulin sensitivity and two related to pancreatic β-cell function.
The authors acknowledged that the study was retrospective, however, and primarily focused on group changes rather than those of individuals.
More recently, in a Nature Medicine paper, Williams and colleagues described their efforts to develop and validate protein-phenotype models for 13 different health indicators. The study ultimately encompassed 85 million measurements of samples from five different study cohorts of almost 17,000 subjects.
“The key hypothesis we were testing was to what extent do proteins encode for future risks, current health state, and behaviors in the human body,” Williams recounts. “We had this great theory and we have this great tool. Now is the time to put our money where our mouths are.”
They then compared their findings with those measured by the best-available true standards.
“For cardiovascular outcomes, it was who died and who was hospitalized,” Williams explains. “For prediction of diabetes, it was who actually developed diabetes some number of years in the future. For body composition, it was a DEXA scan. For fitness, it was a VO2-max measurement on a treadmill. And for liver fat, it was ultrasound.”
Of the 13 models built, he says, 11 proved very successful. The two that didn’t prove successful involved body weight and diet, which Williams acknowledges may have been too easily influenced by lifestyle choices and environmental influences.
He effuses, however, about the performance of some of the other tests.
“The cardiovascular test proved better than any combination of risk factors,” Williams recalls. “The diabetes prediction was better than glucose-tolerance testing. And trying to mimic the body composition DEXA scan, the R2 for example, was 0.91.”
For comparison, he suggests that the ability of polygenic risk scores to predict body composition have an R2 of < 0.10.
“We were getting performances that were 10- to 15-fold greater than that,” he points out.
Beyond the predictability, the study authors also noted that the testing offered potential socioeconomic benefits, as well.
“Acquiring the same information using standard techniques would require physician examination, laboratory testing, exercise stress testing and imaging assessments, with up to nine different patient appointments and potentially thousands of [British] pounds in costs per patient,” they wrote.
Getting back to something Williams highlighted earlier, the authors were quick to note that one limitation in this study was the Caucasian bias in some of the cohorts studied. This could potentially limit generalizability of the findings and would require calibration testing in different populations.
Practice makes perfect
Despite all of the research into establishing and ultimately validating the clinical utility of specific protein biomarkers or panels, the huge challenge remains of developing the assays to be cost-effective and easy to use.
As suggested earlier, MS workflows can be unwieldy for use at the scales required in most clinical settings. That said, Biodesix has seen success with its MS-based VeriStrat proteomic assay of blood-based biomarkers in NSCLC.
At the 2019 IASLC conference, Northwestern University’s Young Kwang Chae presented data showing that in patients receiving immune checkpoint inhibitors, a classification of VeriStrat-Good was associated with a doubling in median progression-free survival vs. VeriStrat-Poor classification and differences in overall survival trended to significance.
Last month, Biodesix initiated the next phase of its biomarker development program with Merck and Pfizer for a proteomic test for likely responders to the anti-PD-L1 checkpoint inhibitor avelumab.
In other situations, translation to standardized immunoassays helps improve throughput and reduce costs, but even here, each test or panel requires its own development process
This is where SomaLogic sees a distinct advantage for the evolution of its SomaScan platform into what Williams describes as a medical information delivery platform: SomaSignal.
“The old way is you would use a SomaScan for discovery,” he explains. “You’d find the proteins that were most informative. You’d make a panel assay, and you’d sell that.
“The problem with that is that it’s not useful for anything else.”
What makes SomaSignal so attractive, he presses, is that the diagnostic models are effectively just outputs from a single platform. Any blood sample for any desired test is run through the identical 5000-measurement workflow.
In January, SomaLogic released three new tests—glucose tolerance, visceral fat and resting energy rate—to bring its pipeline to 10. To add another 10, Williams says, wouldn’t cost much more.
“When you keep the platform the same and consistent, and you always measure the same things in everybody, then any number of these tests can be embedded on top,” he suggests. “Our pipeline of tests is more than 100 [assays] long, and we expect to get through all of those within the next two or three years.”
Some groups see an opportunity to leverage synergies in genomic and proteomic biomarker analysis to strengthen diagnostic decision-making. Such is the case with the Biodesix Lung Reflex Strategy, which combines VeriStrat with the GeneStrat genomic test.
Williams is less convinced of the need.
“I know a lot of people are talking about multi-omics and so on,” he comments. “I think that’s lazy and expensive. What you’re really saying with multi-omics is we don’t know where the biology is, so we’ll just pay for every measurement technique and we’ll see where it is afterwards.”
Given his experience and his acknowledged vested interests, Williams is confident that SomaLogic has answered the question of whether proteins can be a sole information source.
He hasn’t, however, completely closed the door.
“I won’t say that we will never improve performance by including some other modality, but what we really showed was that these models on their own were more than good enough to be useable today,” he says.
Regardless of the source or combination, however, the ultimate goal is to make sure that the snapshot taken today gives a clear picture of tomorrow.