Why do AI drug discovery predictions fail in humans?

AI predictions fail in humans primarily because the models are trained on animal, cell-based, or otherwise incomplete data that does not fully represent human disease biology, immune context, and genetic diversity. Even statistically strong models can produce confident but incorrect predictions when they encounter human-specific variables absent from their training data.

What is the translational gap?

The translational gap is the persistent disconnect between how a drug candidate performs in preclinical models, such as animal studies or cell assays, and how it performs in human clinical trials. It explains why roughly 90% of clinical candidates still fail despite passing preclinical evaluation.

Can AI improve clinical translation?

AI can improve clinical translation when paired with more human-relevant data sources, such as patient-derived organoids and real-world clinical data, and with causal modeling approaches that move beyond correlation. On its own, without better underlying data and mechanistic grounding, AI is unlikely to meaningfully close the gap.

Why do animal models often fail to predict human drug response?

Animal models fail to predict human drug response because species differences in metabolism, immune signaling, and disease pathophysiology mean a mechanism that works in a mouse or rat does not always operate the same way in human biology. These differences limit how far AI models trained on animal-derived data can generalize to human patients.

Is AI expected to replace human judgment in translational research?

No. Specialists in the field describe AI as an amplifier of human expertise rather than a replacement for it, since scientists and clinicians still need to evaluate whether a given prediction makes biological sense before advancing a program.

The translational gap in AI drug discovery: Why preclinical predictions fail in humans

Highlights in this article

The translational gap defined
Where AI models break down at the biology-human interface
Species differences limit AI drug discovery predictions
Data gaps in AI training sets
Promising directions: Organoids, human data, and causal AI
What closing the gap actually requires

Artificial intelligence (AI) has become a familiar part of early drug discovery, where it is now commonly used for target identification, virtual screening, and managing large biological datasets. Still, as these tools move deeper into development, the translational gap between preclinical promise and human outcomes remains AI's toughest test, and roughly 90% of drug candidates that enter clinical trials still fail.

Key takeaways

The clinical failure rate for drug candidates has remained near 90% despite widespread AI adoption in early discovery.
The translational phase, bridging preclinical data and human outcomes, remains one of the toughest areas for AI to meaningfully improve, according to drug discovery AI specialists.
AI models trained on preclinical data inherit the biological blind spots of the animal models and cell assays that generated that data.
Species differences and data gaps in training sets limit how far animal-derived data can generalize to humans.
Organoids, human-derived data, and causal AI methods, paired with rigorous validation, represent the most credible near-term paths toward narrowing the gap.

The translational gap defined

The translational gap is the persistent difference between how a drug candidate behaves in preclinical models and how it performs once it reaches human patients. Despite decades of advances, roughly 90% of drug candidates that enter clinical trials fail, most often because of a lack of efficacy or safety issues. A 2025 analysis of more than 20,000 clinical development programs found that clinical trial success rates declined for roughly 2 decades before recently plateauing and rising again, yet even that recent improvement leaves overall success in the single digits to low double digits.

According to Jo Varshney, CEO and founder of VeriSIM Life, the translational phase is one of the toughest areas for AI to meaningfully improve. "One of the most challenging areas to integrate AI into has been the translational phase: bridging the gap between preclinical data and human outcomes," Varshney said. "The underlying issue is that much of the biological data we collect from animal models or cell-based assays doesn't directly translate to human physiology."

Because AI models learn from existing datasets, those biological gaps carry forward into predictions. Species differences, experimental variability, and incomplete data can all limit how well a model generalizes to human biology, reducing confidence in AI-driven decisions before a drug ever reaches the clinic. Most of that attrition happens in Phase 2, when a drug that looked safe in Phase 1 fails to show adequate efficacy in patients, or in the transition from Phase 3 to regulatory approval, when safety signals emerge that preclinical work did not anticipate.

Where AI models break down at the biology-human interface

That translational challenge is closely tied to another issue: understanding why a drug works, not just whether it might. Many machine learning models are excellent at finding correlations across large datasets, but correlation alone is rarely enough to guide late-stage development decisions. "Traditional machine learning excels at finding correlations, but understanding causation in complex biological systems requires hybrid modeling approaches," Varshney said.

Mechanistic insight becomes increasingly important as programs advance, when developers need to understand dose response, off-target effects, and system-level interactions. Approaches that combine AI with computational biology and physics-based modeling aim to move beyond pattern recognition by grounding predictions in biological mechanisms that better reflect human physiology.

Validation is a critical part of that process, particularly in a regulated environment. "Regulators need confidence that these models are not just statistically robust, but biologically grounded," Varshney said. That means testing predictions across multiple biological scales, from molecular and cellular effects to whole-organism pharmacology. "Importantly, these validations should not only test the predictions themselves but also probe the assumptions and constraints of the AI models," she added. Cross-validation against experimental data remains essential, including comparisons with historical datasets and prospective in vitro, ex vivo, and in vivo studies, and transparency around how models generate outputs is increasingly important for both regulatory review and internal development teams deciding which programs to advance.

Species differences limit AI drug discovery predictions

Species differences remain one of the most stubborn contributors to preclinical prediction failure because no animal model fully recapitulates human disease heterogeneity. As Varshney's comments suggest, much of the biological data collected from animal models or cell-based assays does not directly translate to human physiology, an underlying issue that AI cannot correct for on its own. A widely cited analysis of translational failures notes that animal models are poor predictors of therapeutic success because they cannot capture the pathophysiologic diversity of human disease, even when the underlying molecular target and mechanism are sound.

The mismatch is not limited to efficacy. Metabolic pathways, immune signaling cascades, and even causes of mortality can diverge sharply between species; in one commonly cited cardiovascular example, rodents die from causes rarely seen in human heart attack patients, complicating any direct extrapolation of outcome data. Reviews of external validity in preclinical research reach a similar conclusion: most animal studies are structurally limited in how well their findings generalize to clinical populations.

Because developers train AI models on exactly this animal-derived and cell-based data, the models inherit its blind spots rather than compensating for them. A model that learns to predict toxicity or efficacy from rodent studies can be extremely accurate within that species and still generalize poorly to humans, since the biological signal it learned may simply not exist, or may run in the opposite direction, in human physiology.

Data gaps in AI training sets

Data gaps in AI training sets systematically favor well-studied biology over the rare, heterogeneous, and understudied conditions where translational risk is highest. Large pharmaceutical datasets concentrate on common diseases, well-characterized cell lines, and historically dominant patient populations, which means a model trained on that corpus can perform impressively on familiar biology while offering little reliable signal for rare diseases, pediatric populations, or genetically diverse cohorts.

Where training data gaps most commonly distort AI predictions:

Rare and orphan diseases, where small patient populations produce sparse clinical datasets for models to learn from.
Population diversity, since datasets historically skewed toward a narrow demographic range can limit how well predictions generalize across ancestry, sex, and age groups.
Longitudinal human outcomes, which are far scarcer than cross-sectional preclinical measurements and harder to link back to molecular mechanism.
Negative and failed-trial data, which is rarely published or shared, depriving models of exactly the signal that would help them anticipate failure.

These gaps compound the species-difference problem described above, because they mean AI systems are not just learning from imperfect proxy species; they are learning from an unrepresentative slice even of the data those proxy species generate. A drug discovery scientist evaluating an AI-generated prediction should treat it as a hypothesis conditioned on the training population, not a population-agnostic biological fact, a distinction that also underlies the trial design questions explored in coverage of AI in clinical trials, patient selection, and adaptive trial design.

Data source	Typical use in AI drug discovery	Main translational limitation
Rodent and non-human primate studies	Toxicology, efficacy, and pharmacokinetics	Species-specific metabolism and disease biology limit generalizability to humans.
Immortalized cell lines	High-throughput screening, mechanism studies	Lack tissue architecture, immune context, and patient-specific genetic background.
Patient-derived organoids	Personalized efficacy and toxicity prediction	Improve physiological relevance but still lack systemic factors such as immune surveillance and vasculature.
Real-world clinical and electronic health record data	Trial design, patient stratification, and post-market signals	Often incomplete, inconsistently coded, and biased toward populations with better healthcare access.

Promising directions: Organoids, human data, and causal AI

Even with today's constraints, AI is already delivering measurable value in parts of the pipeline. Its strength lies in integrating and analyzing complex, multi-scale datasets that would be difficult for human teams to manage alone. "AI can process these datasets rapidly, identify patterns that may be invisible to human eyes, and prioritize candidate molecules with far greater efficiency than traditional methods," Varshney said. Simulation-based approaches can also help reduce reliance on animal studies, lowering costs and shortening timelines in early development.

Researchers increasingly use patient-derived organoids, three-dimensional tissue cultures grown from a patient's own cells, to test drug response in a system that preserves more of the architecture and genetic background of the original tumor or tissue than a cell line or animal model can. Systematic reviews of organoid-based drug testing report that organoid response correlates with clinical outcomes in oncology, though scaling these systems for routine use remains a practical challenge.

Regulators are moving in parallel with this shift. The FDA Modernization Act 2.0, signed into law in December 2022, explicitly authorized non-animal alternatives, including cell-based assays, organ-on-a-chip systems, and computational models, and the FDA has since issued draft guidance on AI model credibility, a risk-based framework used to support regulatory decisions. Causal AI, methods designed to model cause-and-effect relationships rather than statistical association alone, is a further pillar of this shift: a recent methodological review argues that causal inference frameworks could help AI models distinguish mechanistic drivers of disease from incidental correlations, precisely the gap that undermines confidence when a correlational model meets human complexity.

A practical framework for translational scientists evaluating an AI prediction:

Identify the training data's species, tissue, and population composition before trusting a prediction's generalizability.
Check whether the model output is correlational or grounded in a testable causal or mechanistic hypothesis.
Cross-validate predictions against orthogonal data sources, including organoid, ex vivo, or early clinical signals where available.
Document the model's context of use and its known failure modes for internal and regulatory review.
Treat the prediction as a prioritization tool that still requires experimental and, ultimately, clinical confirmation.

Human expertise remains central to this process. Scientists and clinicians still play a critical role in interpreting predictions, assessing biological plausibility, and weighing ethical and regulatory considerations. "AI can generate predictions, but humans must evaluate whether those predictions make sense in context," Varshney said. As regulators signal growing openness to non-animal and computational approaches, AI is likely to play a larger role in development decision-making, but expectations need to remain grounded. "AI is a powerful amplifier of human expertise, not a replacement," Varshney said.

What closing the gap actually requires

Closing the translational gap in AI drug discovery will not come from a single breakthrough model but from the cumulative effect of better data, more human-relevant systems, and validation practices built around mechanism rather than correlation. Whether AI can consistently shorten development timelines without compromising safety will depend on data quality, mechanistic transparency, rigorous validation, and regulatory alignment, the same factors that shape biomarker-driven patient stratification and other AI applications further along the development pipeline.

If those pieces come together, AI may not eliminate risk from drug development, but it could help teams identify and manage that risk earlier across the pipeline, within the broader arc of AI-enabled drug development spanning target identification through clinical translation.

This article was produced under Drug Discovery News' AI Editorial Guidelines.