Despite a background in biochemistry, despite 20 years of writing about science, despite several years of medical writing, statistics continue to elude me.
I’ve taken courses in statistics. I have repeated the mantra that a P-value of ≤0.05 is statistically significant. I've quoted confidence intervals in the hopes that they would instill confidence.
But as my writing increasingly revolves around the health of humans and I spend more time looking at the results of clinical trials, I ironically begin to question the correlation between statistical significance and medical relevance.
A year ago, Stuart Pocock of London School of Hygiene and Tropical Medicine and Gregg Stone of Columbia University Medical Center examined the question of what happens when a clinical trial fails to meet its primary outcome.1
In the New England Journal of Medicine, they wrote that “an unreasonable yet widespread practice is the labeling of all randomized trials as either positive or negative on the basis of whether the P value for the primary outcome is less than 0.05.”
“This view is overly simplistic,” they continued. “P-values should be interpreted as a continuum wherein the smaller the P-value, the greater the strength of the evidence for a real treatment effect.”
Their fundamental thesis was that when a treatment fails to achieve its primary outcome, researchers should examine secondary and safety outcomes to see if the therapy might yet have value.
Rather than simply walk away from the treatment, the authors proposed researchers interrogate the trial with 12 questions that drill into aspects of trial design and results. Depending on the answers, this list might question the validity of the study.
As I read through this treatise, however, I noticed that it never questioned the merits of the statistical analysis itself.
There seems to be this tacit assumption that if something is statistically significant, it is medically relevant. The corollary would imply that something determined not statistically significant is therefore not medically relevant.
But is this true? And medically relevant for whom?
One of the 12 questions the authors proposed was: Do subgroup findings elicit positive signals?
The traditional knock to subgroup analysis is whether the subgroup was sufficiently large to permit statistical analysis. At best, the authors suggested, the subgroup analysis points to future, sufficiently powered trials to explore the question.
Is this approach, however, trying to solve the problem with the thinking that created the problem? What if your subgroup consists of subjects who achieved positive outcomes from treatment?
Except possibly in placebo-comparator studies, can we ignore the results simply because we cannot define the subgroup beyond its positive outcomes?
I cannot recall a single clinical trial where zero subjects demonstrated a positive outcome under the test treatment.
For that subgroup of subjects, those outcomes are potentially life-changing and very real.
For that subgroup of subjects, those outcomes may not be statistically significant, but they are clearly medically relevant.
The pages of DDNews, the show floors and lecture halls of countless conferences, the folios of myriad medical journals, the pages of corporate and government web sites routinely blow the clarion of precision medicine, when patients will be parsed in a thousand molecular ways to tailor treatment like we tailor suits.
But can precision medicine live in a world that demands statistical significance?
The ultimate precision of N=1 will likely never be met. It will simply be too expensive to run so many tests for every definitive biomarker. But if N=100 or N=1000, can any study be powered sufficiently to demonstrate statistical significance?
In this month’s Special Report on Infectious Disease beginning on page 22, Viamet Pharmaceuticals’s chief medical officer, Oren Cohen, raised this challenge in considering why it is harder to do clinical trials of antifungals than antibacterials.
“Part of it has to do with the way drugs are developed,” he says. “Frankly, it’s a lot easier to put your finger on a bunch of cases of community-acquired pneumonia to do a clinical trial than it is to do invasive fungi, so part of it is a feasibility issue.”
Ironically, in better understanding human disease pathology, in better understanding the molecular mechanisms of new therapies, in better understanding the characteristics of responders, we may be completely destabilizing the one basket in which medical science has put all of its eggs.
Pocock and Stone suggested “the best option is to avoid this scenario altogether through rigorous upfront planning.”
Although they meant something completely different, I think their advice is sound.
1 Pocock & Stone. “The primary outcome fails – What next?” New Engl J Med. 2016;375:861-870.
Randall C Willis can be reached by email at email@example.com