For decades, scientists believed that the core of a protein was a fragile structure, where a single mutation could unravel the entire fold — like pulling the wrong block in a game of Jenga. But new research recently published in Science suggests that proteins may be far more resilient — and predictable — than previously thought.
Led by researchers at the Centre for Genomic Regulation (CRG) in Barcelona and the Wellcome Sanger Institute, the study challenges fundamental ideas about how proteins evolve and remain stable. Using high-throughput experiments and machine learning, the team generated and tested hundreds of thousands of protein variants from a human SH3 (SRC Homology 3) domain — a small structural domain commonly found in signaling proteins that helps mediate protein–protein interactions — to determine which combinations could still fold and function.
“Our data challenges the dogma of proteins being a delicate house of cards,” Albert Escobedo, first author of the study and postdoctoral researcher at CRG, recently told DDN. “The physical rules governing their stability are more like Lego than Jenga, where a change to one brick threatening to bring the entire structure down is a rare, and crucially, predictable phenomenon.”
A new view of sequence space
For drug developers and protein engineers, this study simplifies a long-standing challenge: how to design stable, functional proteins without endless trial and error. Rather than viewing protein cores as structurally fragile, the research shows they’re more tolerant and predictable, meaning designers can introduce bolder mutations with greater confidence.
By combining high-throughput mutational data with machine learning, the team offers a practical tool to accelerate therapeutic protein design, potentially cutting down time and cost in preclinical development.
These findings suggest that evolution tolerates more internal variation than previously believed — especially when compensatory mutations are present elsewhere in the protein.
“We found that even mutations that are individually destabilizing can be tolerated when coupled with others that compensate for their effects,” Escobedo explained. “This implies that evolution can integrate seemingly deleterious core mutations by introducing permissive changes elsewhere.”
Training machines to learn evolution’s rules
To make sense of the massive experimental data set, the researchers turned to machine learning. They trained an algorithm on variant data from a single SH3 protein, creating a predictive model that could flag stable sequences — even when they bore little resemblance to the original.
When tested against over 51,000 SH3 sequences found across bacteria, plants, insects, and humans, the model correctly identified nearly all of them as stable. That means the biochemical “rules” that govern protein folding have been preserved for over a billion years of evolution — and can be captured computationally.
“Our work shows that models of protein evolution must account for both energy couplings and allosteric constraints,” Escobedo said. “These principles allow us to distinguish sequences that evolved from those that didn’t, offering a more nuanced view of how proteins explore sequence space.”
Implications for faster drug development
The ability to predict protein stability from a single domain has major implications for protein engineering, particularly in pharmaceutical contexts where time and precision are critical.
Directed evolution, a standard method in protein engineering, relies on sequential mutation and screening to improve stability or function. But this is often slow, costly, and limited to small changes.
The CRG team’s approach bypasses some of these bottlenecks. “We measure the energetic effects of mutations, and their interactions, experimentally, rather than relying solely on computational predictions,” said Escobedo. “These experimentally derived energies can be combined to accurately predict the outcomes of multiple mutations introduced simultaneously.”
One major application is protein resurfacing, where proteins are redesigned to reduce immunogenicity. Therapeutic enzymes and antibodies often fail because their surfaces provoke immune responses. Current methods to “silence” immune-reactive regions require extensive screening and often compromise protein stability.
“With our approach, resurfacing could be achieved faster and more cost-effectively, by directly predicting stabilizing and immune-silent variants,” Escobedo said.
What’s next for predictive protein design
Encouraged by the model’s success across the SH3 family, the researchers now plan to extend the framework to other protein domains. Their roadmap: choose representative proteins from diverse families, gather sparse mutational data, and train predictive energy models.
“Data from a single representative of a domain family is sufficient to model the evolution of the entire family,” said Escobedo. “This strategy is both experimentally feasible and scalable, and will allow us to generalize our framework across a broad swath of protein space.”
From theory to application
The study redefines how scientists understand protein robustness and opens a new chapter for rational protein design. Rather than inching forward with small, safe mutations, researchers can now consider bolder, combinatorial changes, and still expect stability.
Professor Ben Lehner, senior author and Head of Generative and Synthetic Genomics at the Wellcome Sanger Institute, underscored the broader impact: “The ability to predict and model protein evolution opens the door to designing biology at industrial speed, challenging the conservative pacing of protein engineering.”
With a clearer view of the protein stability “rules,” drug developers and bioengineers may finally have a faster, more reliable route through the vast sea of sequence possibilities — one that evolution itself has been following all along.













