At the height of the second World War, US military commanders knew they had a major problem in the air. Many of the pilots that left for battle never returned, and the ones who did landed in planes perforated with bullet holes.
The military needed to reinforce the planes with more armor to prevent the enemy forces from shooting them down so easily, but a plane armored from nose to tail would be too heavy to take off. They had to prioritize the most critical parts.

Abraham Wald, a mathematician at the wartime Statistical Research Group handed pilots an outline of a plane and asked them to mark where bullets had struck their aircrafts. Based on the pilots’ notes, the military commanders decided to add armor to the locations on the planes that received the most damage: the wings, the tail, and the fuselage (1).
But Wald said no, they should reinforce the areas with no damage. Assuming that bullets strike every location on a plane, he explained, the planes hit in the most vulnerable spots never made it home (2). They were missing from the data set. Even though the returning planes had been hit by enemy fire, they still arrived home in one piece, suggesting that the spots they’d been hit were not critical.
Unlike the military commanders, Wald took survivorship bias into account when identifying the most vulnerable spots on a plane. Of course, if Wald could have found the missing planes, their damaged sections would have shown him exactly where to put the armor.
About 80 years later, a team of genomics researchers led by Kári Stefánsson and Patrick Sulem at Iceland’s deCODE Genetics searched for the human genetics equivalent of Wald’s missing planes —mutations missing from the population. By combing through the rich sequencing and archival data from the Icelandic population, the researchers identified these missing variants and proved that three of them cause rare genetic diseases, giving patients and their families a long sought-after diagnosis or hope for potential therapeutics (3).
Missing missense mutations
With its remote location and small population, Iceland is uniquely suited to the study of human genetics. Most of the country’s 366,425 inhabitants descended from a small number of ancestors, which led to a higher prevalence of rare genetic variants in the Icelandic population than would occur in larger and more diverse populations.
Because of this, researchers have performed whole genome sequencing on a large portion of the Icelandic population already (4-5). To identify rare genetic variants that cause disease, the team at deCODE Genetics worked with the National Hospital of Iceland to sequence the whole genomes of 764 patients with rare diseases and their families.
“In only about 35% of the cases, we have a solution to the problem,” said Stefánsson. “Then you begin to look at the remaining 65% and begin to ask the question, is it possible among the 65% of cases there are pathogenic variants that have yet to be discovered?”
The easiest disease-causing genetic variants to find are the dominant ones. A child only needs to inherit one disease allele from either their mother or father to present with the disease. Recessive genetic disorders are rarer and more difficult to identify because they require that both the mother and the father give the child the same rare variant.
Stefánsson and Sulem reasoned that if inheriting two copies of a disease allele was so deleterious that the child died during embryonic development or soon after birth, there would be fewer people with both of those alleles present in the population compared to the expected frequency.
Using this approach in a 2015 study, the deCODE Genetics team identified multiple new recessive disease alleles that caused previously undiagnosed rare genetic diseases (6). In this case, they specifically looked for recessive mutations that resulted in the loss of a particular protein.
But not all mutations lead to the complete loss of protein. In the case of missense mutations — where a single amino acid in a protein gets swapped for a different one — the cell still makes a protein, but it makes the wrong protein. Missense mutations don’t always lead to disease, but in rare cases they can.
“For missense mutations, the impact would really depend on the specific genetic change, so that makes everything much, much harder to detect and prove that these missense mutations are really causing a novel genetic recessive disorder,” said Siddharth Banka, a clinical geneticist at the University of Manchester who was not involved in the new study.
With their extensively sequenced and isolated Icelandic population, the deCODE researchers were up for the challenge.
From Iceland to California, the mystery of CPSF3
Armed with sequencing data from 153,054 Icelanders, Sulem and Stefánsson defined a potential missense disease variant as one where they expected at least three people to have both copies of the variant but instead found none. They identified 114 missense variants, 34 of which corresponded to known recessive genetic diseases.
“If we had not found something, we would have been really surprised because logic tells us that we have to find variants in this way, and we did,” said Stefánsson.
When Stefánsson and Sulem compared their list of missing missense variants against the National Hospital of Iceland’s rare disease genomic database, they found multiple patients who had inherited two copies of their newly identified missense variants.
The researchers were particularly surprised to find one of the missense disease variants in the gene CPSF3, which is involved in processing mRNA and transporting it out of the nucleus (7). Scientists had never identified genetic variants of CPSF3 that caused a disease before this study.

Sulem and Stefánsson found two distantly related patients in their clinical database with the same missense mutations in CPSF3 who both had an intellectual disability and microcephaly among other similar features. When they compared the whole genome sequences of both patients, they could find no other genetic explanation for the patients’ disease symptoms.
To find additional patients with missense alleles in CPSF3, they searched their deCODE Genetics genealogical database which includes relatedness information for almost all Icelanders from the last 100 years (4). They identified three couples where both members carried one missense allele of CPSF3. Of the ten children born to these three couples, four died before age eight and had features similar to the first two patients that Sulem and Stefánsson identified.
The researchers obtained tissue samples from two of the children who had died because Icelandic hospitals have kept a tissue archive of all autopsies and biopsies since 1950. Sulem and Stefánsson found that both children, who happened to be related to one of the patients identified in deCODE Genetics’ rare disease database, expressed two missense copies of the CPSF3 gene.
To definitively prove that CPSF3 caused this rare genetic disorder, Sulem and Stefánsson needed to identify the mutation in a non-genetically related patient. They uploaded the CPSF3 genetic variant information to the online service, GeneMatcher, which helps clinicians and researchers working on the same gene find each other and share information. Very quickly, they found two patients at Children’s Hospital of Orange County (CHOC) in Southern California who had two copies of a different missense variant in CPSF3 but who both had very similar disease features.
“We were very fortunate that the initial contact was very quick after we found this match, and the communication has been very easy within the collaboration,” said Rebekah Barrick, a clinical genetic counselor at CHOC and an author of the study. The deCODE Genetics and CHOC researchers found no other genetic explanation for these two patients’ disease features, which led them to conclude that this missense mutation in CPSF3 caused their disease. Barrick and her team were excited to tell the patients’ family that they had at last found the gene responsible for the disease.
“We've followed them for many years, always looking for answers using available testing and technology. But finally, coming to what we think is the answer for them — they were excited. They have a name or at least a gene to understand what's been going on,” Barrick said.
Identical GNE variants are worse together
Researchers have known since 2001 that recessive missense mutations in the gene GNE cause the rare, muscle wasting disease GNE myopathy, which typically manifests between age 20 and 40 (8). While people with GNE myopathy have two missense alleles that cause their disease, these alleles always have mutations in different places in the GNE gene.
When Stefánsson and Sulem discovered that there was no one in the Icelandic population with two copies of the same specific missense mutation in the GNE gene, they were puzzled. They had been working with a couple whose daughter had died soon after birth. Both parents expressed one copy of the same GNE missense allele, and through post-mortem sequencing of tissue from their daughter, Sulem and Stefánsson confirmed that she carried both copies of the GNE missense allele.
“We went back to the clinician, and they said, ‘that doesn't fit because we're expecting something happening in the second decade of life.’ They were not necessarily expecting something that early and that drastic, so we had to do more, get confidence, and convince them that there was this deficit in the population,” said Sulem.
Using their missing mutations approach, Sulem and Stefánsson expected that if the missense mutation was not causing a disease, there should be at least six people with two copies of it in the Icelandic population, but there were none. At the time they investigated this, the daughter’s mother was pregnant with another child. During the mother’s 12-week ultrasound, doctors noticed thickening around the neck of the child, potentially suggesting GNE myopathy. Sulem and Stefánsson sequenced a sample from the developing fetus and confirmed that the fetus also expressed two copies of the same GNE missense mutation as their sister.
“Sometimes this mutation was seen, but together with probably a milder version. It's a combination now of the two” identical missense alleles that cause this more severe version of the disease, Sulem said. “If you think of a continuum, we're probably reaching one end.”
Missing births with GLE1 mutations
Encouraged by their identification of missense disease alleles in CPSF3 and GNE, Stefánsson and Sulem searched for other missing missense disease alleles. They noticed that no one harbored two copies of a specific mutation in the gene GLE1 when at least ten were expected in the population.
Similar to GNE, researchers had shown that when the GLE1 missense allele that Sulem and Stefánsson found was expressed with a different GLE1 missense mutation, it caused death right before or soon after birth (9).
To determine why there were no people with two copies of the same GLE1 missense allele, Sulem and Stefánsson identified 17 couples who each carried one copy of the GLE1 allele. To their surprise, Sulem and Stefánsson found that none of the couples had lost a child soon after birth.
Stefánsson cautioned, “When you're working with extremely rare phenomena, you have to live with the fact that absence of evidence is not evidence of absence.”
They hypothesized that having two copies of this missense mutation caused the fetus to die early during development, likely during the first trimester of pregnancy. In fact, in interviews with the couples and in reviewing their medical records specifically for a note of early miscarriage, Sulem and Stefánsson learned that more than 60% of the women in these couples reported having a miscarriage between 5- and 8-weeks of pregnancy, compared to the 12-24% rate of miscarriage at this timepoint in the general population (10).
While they did not have fetal samples to sequence, Sulem and Stefánsson hypothesized that expressing two of these identical missense mutations leads to very early spontaneous abortions.
“It really shows the power of what can be achieved if [there is] very good coverage of genotyping across a given population,” said Banka. “This approach not only identifies new genetic disorders, but also enables [them] to identify genetic disorders, which in any other way would be probably very challenging to identify.”
Banka was impressed with the thoroughness of the deCODE Genetics team’s study and the number of steps the researchers took to prove the causality of each missense variant they found. As a next step, he is interested in learning more about the function of the missense mutations they found, for example, by assessing the severity of the mutations in patient cells or animal models.
With their team at deCODE Genetics, Sulem and Stefánsson are working on a follow up study to identify more missing mutations in an even larger population.
“This is a method that can be applied to figuring out the causes of diseases of early childhood, for example, [and] understanding why spontaneous abortions occur,” said Stefánsson.
With a better understanding of the genetic mechanisms that drive some of these diseases, researchers can develop new treatments for them, especially for the diseases that manifest in early childhood. Depending on the underlying disease mechanism, medication may already exist that can be repurposed for a particular genetic disease.
While many genetic diseases have no cure yet, a diagnosis allows patients and their families relief from going through more diagnostic testing. A diagnosis may also help parents with family planning for future pregnancies and may allow for additional social support for children living with rare genetic disorders.
Banka added, “Even if none of this was possible, that it doesn't alter your reproductive choices, it doesn't alter your management, it doesn't alter the ease of accessibility to help — even if all of that was not possible, having an understanding as to what is the reason for someone's medical problems can be quite therapeutic in itself.”
References
- Mangel, M. & Samaniego, F.J. Abraham Wald's Work on Aircraft Survivability. Journal of the American Statistical Association 79, 259-267 (1984).
- Wallis, W.A. The Statistical Research Group, 1942-1945: Rejoinder. Journal of the American Statistical Association 75, 334-335 (1980).
- Arnadottir, G.A. et al. Population-level deficit of homozygosity unveils CPSF3 as an intellectual disability syndrome gene. Nat Commun 13, 705 (2022).
- Gudbjartsson, D. et al. Large-scale whole-genome sequencing of the Icelandic population. Nat Genet 47, 435-444 (2015).
- Jónsson, H. et al. Whole genome characterization of sequence diversity of 15,220 Icelanders. Sci Data 4, 170115 (2017).
- Sulem, P. et al. Identification of a large set of rare complete human knockouts. Nat Genet 47, 448-452 (2015).
- Dominski, Z. et al. The Polyadenylation Factor CPSF-73 Is Involved in Histone-Pre-mRNA Processing. Cell 123, 37-48 (2005).
- Eisenberg, I. et al. The UDP-N-acetylglucosamine 2-epimerase/N-acetylmannosamine kinase gene is mutated in recessive hereditary inclusion body myopathy. Nat Genet 29, 83-87 (2001).
- Said, E. et al. Survival beyond the perinatal period expands the phenotypes caused by mutations in GLE1. Am J Med Genet Part A 173A, 3098-3103 (2017).
- Jurkovic, D., Overton, C., Bender-Atik, R. Diagnosis and management of first trimester miscarriage. BMJ 346, f3676 (2013).