The story of DNA is far from finished. From Swiss chemist Friedrich Miescher’s first identification of nucleic acids in 1869 to the pivotal X‑ray crystallography studies by Rosalind Franklin and Maurice Wilkins that enabled Watson and Crick to derive the double‑helical structure in 1953, scientists have steadily unraveled life’s fundamental code. Landmark milestones such as the Human Genome Project’s draft in 2003 and the recent telomere-to-telomere completion of a gapless human reference genome in 2022 have delivered the most complete sequence of human DNA to date.
But sequencing a genome is only the beginning. To truly understand how our DNA works, researchers must annotate it — identifying genes, mapping their transcripts, and deciphering the regulatory elements that control when, where, and how genes are turned on or off. Protein-coding DNA accounts for only a small fraction of most eukaryotic genomes — about one to two percent in humans — yet the remaining non-coding DNA carries a vast array of regulatory instructions that control gene activity across different cell types and conditions.
On a bold mission to try and annotate the remaining 99 percent, seven research groups joined forces in Oncode Institute’s PERICODE project. The first result of that effort, reported in Nature, is the PARM model, which uses millions of precise experimental measurements combined with deep learning to decode promoter activity across cell types. This breakthrough could transform our understanding of disease, guide the design of new therapies, and finally shed light on the vast non-coding regions of DNA that have long puzzled scientists.
The start of PARM
The development of PARM, short for Promoter Activity Regulatory Model, is the product of a highly coordinated effort within the PERICODE project, a collaborative initiative of the Oncode Institute, a virtual institute connecting top cancer researchers in the Netherlands. Seven principal investigators from multiple institutions joined forces, bringing together expertise in genomics, computational biology, oncology, and biochemistry. This interdisciplinary approach was essential for tackling the complex challenge of decoding the non-coding genome.
Jeroen de Ridder, principal investigator at UMC Utrecht at the time of the study and a senior author on the study, told DDN that the project began as part of Oncode’s mission to fund bold, high-risk research ideas. “We were put in a room and asked to come up with grand ideas,” he recalled. “My expertise in AI and Bas [van Steensel]’s expertise in genomics, combined with the clinical, genetics and proteomics expertise of five more [principal investigators] turned out to be a very good match in attacking the fundamental question: how does the non-coding genome play a role in regulating gene expression.”
We wanted to annotate non-coding mutations in cancer genomes. Every cancer genome has tens of thousands of mutations, most of which do nothing, but some are critical. Our aim was to identify which ones matter.
—Bas van Steensel, Oncode Institute
Bas van Steensel, principal investigator at the Netherlands Cancer Institute, explained that the long-term goal was clear from the start: “We wanted to annotate non-coding mutations in cancer genomes. Every cancer genome has tens of thousands of mutations, most of which do nothing, but some are critical. Our aim was to identify which ones matter.”
To achieve this, the team needed a way to pinpoint which mutations actually affected gene expression.
Why causality data matters
A key design choice behind PARM was the decision to train the model on massively parallel reporter assay (MPRA) data rather than on large compendia of epigenomic or transcriptomic maps. Although these datasets capture genome-wide regulatory patterns, they provide only correlative information, rather than revealing the causal contributions of individual DNA sequences to gene expression.
MPRAs take a fundamentally different approach. Millions of short DNA fragments are tested in isolation within a single cell type, each linked to a barcode that reports its regulatory activity. Because each sequence is measured independently, changes in gene expression can be directly attributed to specific DNA elements. This generates causal, high-resolution data for the model, which can then predict how untested sequences or mutations would influence promoter activity.
“Correlative data can only go so far,” de Ridder noted. “Here, we provided the model with smart, targeted data so it could learn more quickly, rather than having to figure out for itself what is useful and what is not.”
By combining these high-resolution measurements with deep learning, PARM can identify which sequence motifs drive gene activity, how these motifs interact, and how their positions within promoters influence regulation. The model can also predict how previously untested sequences or mutations will affect gene expression, providing a powerful tool to explore the functional consequences of genetic variation.
This approach sets PARM apart from other efforts in the field. “The field seems focused on building ever-larger models that incorporate more and more data — take, for example, the recent AlphaGenome, which came out just a week before our work,” de Ridder said. “That’s a fantastic effort and will advance the field, but it’s a bit like looking at Earth from space. We take a microscope approach, zooming in on what we want to study — in this case, promoters in specific cell types or under specific stimuli.”
Validating the approach
One of the most striking outcomes of PARM is its ability to refine our understanding of promoter architecture. “The model can now predict, for each promoter, which transcription factors regulate it and exactly where their motifs sit,” said van Steensel. “It has incredible resolution, revealing patterns in how transcription factors are positioned: some prefer to be very close to the transcription start site, others slightly upstream, and a few even downstream. We start to see biological patterns and rules that were partially known before, but never with this level of detail.”
We can do this genome-wide, for every promoter in a cell type, and even under different stimuli or drug treatments. And we can do it on standard academic hardware, not massive clusters. That’s how we’ve uncovered some really interesting regulatory patterns that would have taken years to discover experimentally alone.
—Jeroen de Ridder, Oncode Institute
In practice, PARM enables experiments that would have been impossible with traditional approaches. Researchers can, for instance, perform in silico mutagenesis, systematically changing one base at a time across a promoter and predicting how each alteration affects gene expression. This provides insights into which nucleotides are critical, how transcription factors coordinate, and which sequences might mediate disease-relevant regulatory changes. De Ridder explained, “We can do this genome-wide, for every promoter in a cell type, and even under different stimuli or drug treatments. And we can do it on standard academic hardware, not massive clusters. That’s how we’ve uncovered some really interesting regulatory patterns that would have taken years to discover experimentally alone.”
While this system was created to annotate non-coding mutations in cancer genomes, it has already proven to be far more versatile. By revealing the rules that govern promoter activity, PARM provides a blueprint for understanding gene regulation in virtually any cell type. Researchers can now explore how transcription factors coordinate, predict the effects of mutations or drug treatments, and even design synthetic regulatory sequences — all without performing exhaustive experiments in the lab.
Looking forward, the team aims to expand the model to incorporate enhancers, epigenetic modifications, and chromatin architecture, moving toward a more complete map of the genome’s regulatory code. By combining high-resolution experimental data with computational modeling, PARM could provide a blueprint for decoding the entire non-coding genome and understanding its role in disease — providing new opportunities for diagnostics, therapies, and precision medicine.












