For many years, a curious and seemingly invisible mystery existed within a Dutch family. Several members of the family developed choroideremia, a rare form of progressive retinal degeneration that ultimately causes severe vision loss or blindness. Doctors knew that family members passed down choroideremia to their offspring, but couldn’t figure out what exactly caused the disease. Where did the answer lie within the epically complex tangle of the human genome?
Under the guidance of geneticists Susanne Roosing and Alexander Hoischen at Radboud University Medical Center, scientists leveraged multiple long-read methods to finally find the causative variant for choroideremia in this family (1). Rather than depending on the more commonly used methods of whole exome or whole genome sequencing, the researchers combined optical genome mapping with long-read whole genome sequencing to discover what went wrong in the DNA.
Choroideremia is usually linked to pathogenic variants in the CHM gene (named for choroideremia), and clinical diagnoses of choroideremia can be traced back to mutations in the CHM gene for more than 90% of cases (2). In the Dutch family, however, there was no traceable variant in the CHM gene’s DNA.
“It’s not really clear what the etiology of the disease is. We understand what the pathology looks like, and we understand that it’s an X-linked disease,” said Maureen McCall, an ophthalmology researcher at the University of Louisville who was unaffiliated with the study. “We don’t always understand where the defect arises and how that defect translates into vision loss.”
For this particular Dutch family, previous work had revealed a loss of CHM’s exon 12 in the RNA transcripts from affected males and carrier females. According to Zeinab Fadaie, one of the first authors of the study, this exon skipping hinted at an intronic variant.
While scientists focus a lot of attention on exons, introns and other non-coding regions quietly regulate the genome. Changes in these regions can directly affect gene expression, transcription, and translation.
Fadaie, whose doctoral work focused on identifying non-coding mutations, saw this as an opportunity to investigate what other scientists might have missed. She and her colleagues began by conducting Sanger sequencing around the family members’ exon 12 in the CHM gene to exclude variants that might be more easily detected. Sanger sequencing allows for fast, targeted sequencing of small DNA fragments, but it did not turn up any pathogenic variants around the exon.
To take a broader look at this potentially problematic area of the genome, the researchers next used optical genome mapping, which is “a bit like musical notes,” Fadaie said. With this approach, scientists choose a set of 6 nucleotides as a “musical note” that repeats itself numerous times in the genome (the “sheet music”). By labeling this nucleotide set with a fluorophore, scientists can screen the genome for where and when the nucleotide set appears and look for a pattern that differs between the reference genome and a patient’s genome.
“This pattern is important to read the correct music,” Fadaie explained. “If the pattern is altered in the patient compared to the reference, then you can better understand what is happening. If, for instance, in the middle, two labels are missing in the patient, it means that there was a deletion.”
With this optical genome mapping approach, the researchers uncovered a large structural variant in the form of an insertion in the CHM gene. However, the details of the insertion were elusive. When the researchers attempted to isolate and sequence that section of the genome, target amplification through polymerase chain reaction (PCR) did not work, suggesting that something complex was at play.
To overcome this, the researchers turned to sequencing the entire genome using long-read sequencing, which generates reads that span much longer strands of DNA (kilobases rather than a few hundred bases). According to Stephen Daiger, a researcher at the University of Texas Health Science Center’s School of Public Health who was unaffiliated with the study, long-read sequencing generates a “bigger picture overview” of the genome that includes any potential structural variation, although it may not always be as accurate at identifying singular point mutations (3).
If the pattern is altered in the patient compared to the reference, then you can better understand what is happening. If, for instance, in the middle, two labels are missing in the patient, it means that there was a deletion.
- Zeinab Fadaie, Princess Maxima Center for Pediatric Oncology
Based on the long-read sequencing, and later validated via Sanger sequencing, Fadaie and her team finally determined that this mysterious insertion was an inverted duplication. “If you have a sequence in your genome, for example ABCDE, it looks like this: A to E region duplicated but inverted. So, the inverted duplication will read ABCDE-EDCBA,” said Fadaie. Within the Dutch family, this structural variant manifested within intron 12 as an inverted duplication containing an additional copy of exon 12 and a portion of its surrounding introns.
The researchers’ next step was to understand how this inverted duplication arose. By looking at the ends of the mutation, they found two microhomology regions (short DNA sequences that are complementary to each other) at the 5’ and 3’ breakpoints surrounding the variant. They hypothesized that after a double-stranded break in the DNA, these microhomology regions facilitated DNA polymerase leaping from strand to strand while replicating the gene, ultimately generating an extra copy of exon 12 that was glued backwards into the intron via mismatch repair mechanisms.
The researchers theorized that this inverted duplication created a hairpin structure in the resulting RNA transcript, where the inverted exon 12 and the normal exon 12 act “like the north and south pole of a magnet; they stick together,” said Fadaie.
This hairpin is likely to interfere with normal splicing mechanisms in that region. Ordinarily, protein complexes would complementarily bind to specific splice sites in or near the exons and excise the intronic regions from the RNA transcript. Hairpin formation could potentially block the protein complex from binding to relevant sites in exon 12, causing the complex to skip to the next exon’s binding site, thus excising exon 12 along with its surrounding introns. This would create the truncated RNA transcript that was originally seen in the Dutch family.
Fadaie and her colleagues have not yet functionally proven their hairpin idea, but "I'll be happy if somebody does," she mused. They have validated its potential through computational modeling, however.
The combination of non-traditional methods, such as optical genome mapping and long-read whole genome sequencing, ultimately allowed the team to identify the disease-causing genomic variant. In the future, the researchers hope to apply these methods for identifying other previously invisible variants. They also hope to understand more fully the downstream effects of those variants on RNA, protein, and the human body as a whole.
“This is a very elegant study,” McCall said. “The fact that they have pioneered this strategy is huge because there will now be a way for other families to be diagnosed more clearly. And with that, it also may help to understand the pathophysiology better.”
For Fadaie, the final goal is to bring this newfound knowledge to the patients. “I really like translational research because whatever is done, whatever we are doing, it has to come to the patient at some point. Otherwise, there is no point,” she said. “When we know what is wrong, we can start thinking about how to solve it.”
References
- Fadaie, Z. et al. Long-read technologies identify a hidden inverted duplication in a family with choroideremia. Human Genetics and Genomics Advances 2, 100046 (2021).
- Simunovic, M. P. et al. The Spectrum of CHM Gene Mutations in Choroideremia and Their Relationship to Clinical Phenotype. Investigative Ophthalmology & Visual Science 57, 6033–6039 (2016).
- Logsdon, G. A., Vollger, M. R. & Eichler, E. E. Long-read human genome sequencing and its applications. Nat Rev Genet 21, 597–614 (2020).