Three billion base pairs make up each human genome, and the Telomere-to-Telomere (T2T) Consortium recently took on the Herculean task of sequencing every single one of them. But DNA isn’t the only thing in their crosshairs.
T2T also offers a first look at the epigenome that accompanies each DNA base — information that was missing in several crucial regions in older maps of the human genome. The T2T team hopes that having a fuller picture will shed light on how the epigenome helps cells work and what happens when things go awry.
“Our goal at the T2T Consortium is a complete base-by-base picture of our genome,” said Karen Miga, a genome biologist at the University of California, Santa Cruz and co-chair of the consortium. “Understanding the DNA modifications is a perfect example of information that can be brought to light by having a more complete genome.”
Lost in the genomic desert
While DNA carries the instructions for all cellular functions, not every cell looks or acts the same. This is because of differences in the epigenome: chemical modifications to the DNA and surrounding proteins. These tweaks influence how easily a given cell can read the instructions encoded in the DNA, determining where and when a cell cracks open this expansive instruction manual.
Sophisticated technologies can spot epigenetic molecules coating pieces of DNA. But to figure out exactly which genes are affected by the modifications, scientists need to find the spot in the human genome that matches the coated DNA sequences. This wasn’t always an easy task; older versions of the human genome were choppy, missing sections of the genome that were too repetitive to figure out where exactly a fragment belonged.
“You're basically standing on a dune in the Sahara Desert looking in every direction, and you have no idea where you are — it’s all desert,” said Rachel O’Neill, a genome biologist at the University of Connecticut and senior author of one of the T2T papers. “When you’re in the middle of those repeats, you’re lost.
However, some of these dunes could still be biologically important regions of DNA. For example, the repetitive midsection of each chromosome, the centromere, plays an important role in ensuring that cells divide properly, and some scientists think that epigenetic errors in this region can increase the risk for cancer. That makes the task of understanding epigenetic modifications in these regions “pretty fundamental,” said Steven Henikoff, a molecular biologist at the Fred Hutchinson Cancer Research Center who was not involved in the T2T projects.
To be able to navigate repeats and other tricky genome topologies, the T2T Consortium used the relatively new long-read sequencing technology. This method reads the genome in longer segments than conventional methods: more than 20,000 bases at a time, instead of a few hundred. This means that instead of glimpsing a small portion of a repeated sequence and not knowing which part of the genome it came from, the technology can read longer stretches of repeats and map their locations like finding an exact match for a sentence in a book, instead of trying to match a single word.
As an added bonus, long-read sequencing automatically detects one epigenetic modification: base methylation. “The epigenetic information comes along for free,” said Winston Timp, a biomedical engineer at Johns Hopkins University and senior author of one of the T2T papers. Measuring other types of epigenetic modifications requires some extra experiments, but it can be tacked on with relative ease, he added.
“I think of the epigenome as the fifth base,” Timp said. “We shouldn’t be ignoring it.”
Creating a blueprint
With the help of long-read sequencing, the T2T Consortium successfully produced a complete sequence of 22 human chromosomes and the X chromosome — simultaneously creating a complete map of methylation on each base (1). Using Oxford Nanopore and PacBio’s long-read technologies, they slogged through the eight percent of the genome that hadn’t been sequenced before.
For Ting Wang, a geneticist at Washington University in St. Louis who was not directly involved in the T2T projects, the new research is the beginning of the next phase of epigenome mapping efforts. He worked on the Roadmap Epigenomics Project, an earlier effort supported by the National Institutes of Health to conduct large-scale epigenomic profiling (2). That project, which concluded in 2018, built an epigenome reference on top of an earlier version of the genome, but it was limited by the gaps. “Before we had a complete reference genome, it was impossible to have a complete reference epigenome,” Wang said.
Now, with the new T2T reference, researchers can better interpret results from older epigenomic datasets, such as Roadmap and the Encyclopedia of DNA Elements (ENCODE). The T2T researchers looked for matches between the older sequencing data from these projects and the new complete genome and linked epigenetic modifications to more genes, including those involved in diseases.
For example, epigenetic modifications in the neuroblastoma breakpoint family (NBPF) genes were previously thought to trigger brain tumors, but the genes were so similar to each other that it was difficult to pinpoint the exact culprits. With the T2T reference, the researchers linked a constellation of tumor-specific epigenetic marks to specific NBPF genes (3). Wang expects that with more long-read epigenetic data in the future, the T2T reference will prove even more useful.
T2T’s data analysis software is a key resource for other researchers who want to take a base-by-base walk through a gene or genomic region of interest, Henikoff said. Mapping repetitive long-read sequencing data is still a relatively new challenge, and many of the tools didn’t exist prior to this project (4).
“We could only look at where we could shine the light before, but now we have a map,” Timp said. “We provided a blueprint for how to do this.”
The epigenetic piece of many puzzles
The Consortium has published many papers using the new data, each focusing on a different previously obscured aspect of the genome, and the power of the epigenetic data has earned it a place in almost every story, Miga said. Her lab, for example, studies centromeres, the genomic regions in the middles of chromosomes that serve as central hubs to keep related pieces of DNA organized but separate while a cell divides.
The T2T team noticed that there was less methylation at specific spots on the centromere. The location of these “centromeric dip regions” varied between chromosomes, and when they looked at the X chromosome in people from around the world, they also varied between people. A closer look showed that these epigenetic changes corresponded to the location where key proteins bind during cell division.
This was just one of many surprises hidden in the methylation data. Repetitive regions in the centromere and elsewhere in the genome had diverse methylation profiles, even in regions thought to share a critical function (5). “It sounds like finishing a project,” said Henikoff, who also studies centromeres. “But in a way, it's kind of a beginning for those of us who are interested in centromeres.”
Timp was also interested to learn that genes that had duplicated over the course of human evolution didn’t always have the same epigenetic marks. In fact, in some cases, one copy had marks suggesting that it was silenced while the other was active. Similar patterns emerged in repeating sequences of DNA.
O’Neill, who studies the genome from an evolutionary perspective, sees this as an example of how the genome might use epigenetic modifications to defend itself against DNA fragments that could wreak havoc if transcribed. Like many of the other investigators, she didn’t know what to expect when delving into unexplored parts of the genome and was surprised to find a transposable element that was highly transcribed, unlike other repeats (6). These mobile pieces of DNA could induce epigenetic changes in their surrounding regions and form boundaries for different regions of chromosome structure.
“This technology has reinvigorated a field that's actually several decades old,” said O’Neill, who is planning to build off this work to continue studying how repeat regions influence chromosome structure.
Bringing epigenetics to the clinic
As a cancer researcher, Henikoff is particularly interested in how cell division goes wrong in cancer cells. The epigenetic modifications that the T2T team found around centromeric repeat elements might hold the answer. O’Neill also wonders whether the variability of the epigenome between people can explain variability in cancer risks and outcomes. For example, epigenetic modifications that make repeat elements less stable might be dire handicaps for patients who already have common cancer mutations that hamper their cells’ abilities to fix errors. Understanding this will require analyzing this variability in healthy and sick people, Timp said.
“Just like how genomics is going to give us better personalized therapies, I think the epigenome can also play a role there as well,” Timp said.
Epigenetic marks might also make good targets for therapeutics. Some cancer treatments, such as histone deacetylase inhibitors, already target the epigenome.
Bringing that kind of knowledge into clinical practice might still be many years away. Technologies need to improve in accuracy, and being able to sequence genomes from hundreds or thousands of people will require improvements in efficiency and cost.
“It is early days, but this will eventually become the standard for research,” said Christopher Mason, a genome biologist at Weill Cornell Medical Center who was not involved in the T2T projects, in an email. “It is just a matter of time before it is the standard in clinical care as well.”
First of many
Miga is careful to note that this is not “the human genome,” but rather is “one human genome.” Not only do DNA sequences vary between people, but the DNA used in this study actually came from a cell line that looks quite different from the average human cell.
For starters, this cell line, dubbed CHM13, comes from a type of molar pregnancy, so it resembles an early embryonic state, rather than the cells with fully developed functions in an adult human. Notably, it is haploid. This made the researchers’ jobs a bit easier because it meant that they didn’t have to figure out which copy of the genome each sequence came from. But this means that the genome and epigenome could look different in adult human cells.
“It’s not your cell, my cell, or anybody’s cell,” Wang said. “It's a valuable tool for building this complete reference, but in terms of the biology, it whets your appetite but is far away from being what we want.”
Miga worried about this too, especially when she saw the intriguing methylation patterns in the centromere. Without prior data on this region, she couldn’t tell if the patterns were unique to CHM13 cells or early developmental stages. To help answer this, she and her team sequenced an X chromosome from a more differentiated diploid human cell line. The methylation patterns were generally the same, which was an encouraging sign for Timp and Miga. But methylation was absent from other regions of CHM13’s genome in regions that are almost always methylated in differentiated cells.
Beyond the 3 billion bases that T2T sequenced, Miga thinks that even more value will come from extending understanding of variation between genomes. Epigenetic studies have long shown that differences in DNA modifications are part of what gives different cell types their specialized functions, and the T2T data suggest that epigenetic patterns in the centromere can even vary between people from different parts of the world.
Wang, who is collaborating with Miga on the Human Pangenome Reference Consortium (HPRC), hopes to do just that. The HPRC aims to sequence 350 whole genomes from people of diverse ancestries, and Wang’s goal is to make a “human pan-epigenome” alongside it.
“The symbolic meaning of [the T2T] epigenome is just enormous,” Wang said. “In my own mind, it’s really much bigger than the actual biology.”
- Nurk, S.*, Koren, S.*, Rhie, A.* et al. The complete sequence of a human genome. Science 376 (6588), eabj6987 (2022). *authors contributed equally
- Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518 (7539), 317-30 (2015).
- Altemose, N. et al. Complete genomic and epigenetic maps of human centromeres. Science 376 (6588), eabl4178 (2022).
- Jain, C. et al. Weighted minimizer sampling improves long read mapping. Bioinformatics 36 (Suppl. 1), i111–8 (2020).
- Gershman, A. et al. Epigenetic patterns in a complete human genome. Science 376 (6588), eabj5089 (2022).
- Hoyt, S. et al. From telomere to telomere: The transcriptional and epigenetic state of human repeat elements. Science 376 (6588), eabk3112 (2022).