For billions of years, life on Earth has lived by a code of three.
With its four nucleotides, DNA provides life’s blueprint, and by reading triplet nucleotide codons, ribosomes assemble life’s building blocks: proteins. While there are 64 codons in our triplet-based genetic code, nature only matches 20 amino acids to these triplet sets to create all of life.
“The core of biology is the genetic code, 20 amino acids. And the question is, why 20? Why not 21? Why these 20? And if you have more, could you do better?” asked Peter Schultz, a chemist and CEO of the Scripps Research Institute.
These questions have fascinated scientists for years and prompted many of them to investigate the limits of our genetic code. Some developed ways to make the code less redundant and integrate new amino acids not found in nature into proteins. Others integrated some of these new amino acids into multicellular organisms like mice and fish. But these methods still relied on a triplet genetic code, with only 64 potential amino acids with which to create life.
“For 6 billion years, life on Earth has been writing a manual of useful stuff for being alive,” said Erika DeBenedictis, a postdoctoral scholar at the University of Washington and a genetic code expansion researcher. “The triplet code was big enough to work, and so it took off. Once it took off, there was not an incentive to switch — unless you're a human engineer.”
Over the past few decades, scientists have discovered that they can expand the genetic code beyond the triplet code to a quadruplet code. With a quadruplet code, there are 256 codons that can specify new amino acids. While an entire organism with a quadruplet codon-based genetic code is likely many years in the making, scientists are optimizing quadruplet codon translation systems such that the creation of proteins with new properties for therapeutics and manufacturing based on a quadruplet code is within reach.
Genetic code expansion started with a stop
To build a protein, the cell requires a few key ingredients: a strand of mRNA, transfer RNAs (tRNAs), aminoacyl tRNA synthetases, amino acids, a ribosome, and a few other factors to help all of the components bind to the right place at the right time.
First, aminoacyl tRNA synthetases place the correct amino acid onto the corresponding tRNA molecule. To start protein translation, a tRNA bearing the amino acid methionine binds to the AUG start codon on an mRNA molecule attached to the small ribosomal subunit. The large ribosomal submit then connects with the small subunit, sandwiching the tRNA inside. The ribosome then moves along the mRNA, reading nucleotides in sets of three-base codons and allowing each corresponding tRNA with its specific amino acid to bind to the RNA and add its amino acid to the growing polypeptide chain.
By reading the mRNA transcript three nucleotides at a time, the ribosome ensures that the correct amino acid is inserted in the right place. If there is an insertion or deletion of a nucleotide, the codons will be out of frame, resulting in a protein that has the wrong amino acid added in the wrong place, leading to a malformed or truncated protein.
When the ribosome reaches a specific three base sequence in the mRNA encoded by UAA, UAG, or UGA — the stop codons — it won’t add another amino acid. Instead, a release factor will bind to the ribosome, disassembling the ribosome and the amino acid chain from one another. The amino acid chain can finish folding into its final protein structure.
As scientists dissected the fundamental components of protein translation, they realized that if they wanted to create proteins with new properties for therapeutics and research tools, a logical place to begin expanding the capabilities of the genetic code was at these stop codons.
“There are three stop codons, but you only stop once,” said Schultz. By recoding one of the two extra stop codons to instead code for a new amino acid, scientists could add new amino acids with novel properties to proteins.
To do this, scientists designed tRNA and aminoacyl tRNA synthetase pairs that were different enough from a cell’s endogenous tRNA and aminoacyl tRNA synthetase machinery that they wouldn’t interfere with each other. By engineering the new tRNA to recognize the stop codon UAG, also called the amber stop codon, scientists could incorporate a noncanonical amino acid into a protein (1).
Researchers also took advantage of the fact that some methanogenic archaea naturally have a tRNA-aminoacyl tRNA synthetase pair that recognizes UAG, at which it incorporates the amino acid pyrrolysine, a derivative of lysine that is not naturally found in other organisms (2).
The process of recoding the extra stop codons in the genetic code has worked so well that it is already being used to develop antibody-drug therapeutics. The biotech company Ambrx, for example, incorporates novel, synthetic amino acids into proteins to make antibody-drug conjugates for metastatic breast cancer, gastric cancer, and other solid tumors.
“Most biologics are manufactured on a scale of many grams per liter, and suppression of three base nonsense codons has now been done in bacteria and mammalian cells at the five to ten gram per liter level,” said Schultz. “To add an amino acid to the genetic code and have either a mammalian cell or a bacterial cell live long enough to produce ten grams of protein per liter is pretty impressive.”
The success of substituting stop codons with new amino acids is undeniable. But still, scientists wondered if they could replace more than two different amino acids. To do that, many decided to explore the possibilities of quadruplet codons.
For more amino acids, four is better than three
Switching between a triplet and quadruplet genetic code in the same cell seems like it would create enough incorrectly formed or truncated proteins from reading frame errors to kill the cell. But nature has evolved mechanisms to prevent proteins from falling out of frame during translation in the form of tRNAs that induce +1 or -1 nucleotide frameshifts. Viruses often use +1 frameshifting to produce more proteins from their small genetic codes than they could otherwise.
Early experiments exploring the biology of +1 and -1 frameshifting involved the bacteria Salmonella typhimurium, which expresses a tRNA that reads quadruplet codons to suppress an insertion in an enzyme required for histidine biosynthesis (3).
“There are, for whatever reason, tRNAs that can accommodate this four base anticodon and not kill the cell and frameshift all of the other host genes. Somehow [they] only affect the gene of interest,” said Ahmed Badran, a bioengineer at Scripps Research Institute. “There are probably engineering principles that we could elucidate from this, but I think so far, that really hasn't been condensed into a set of rules that one can use to really investigate this.”
Unlike recoding a stop codon to recognize a new tRNA bound to a noncanonical amino acid, translating a quadruplet codon requires more effort from the cell.
“Quadruplet codons are not really compatible with the current ribosome,” said Jiantao Guo, a synthetic biologist at the University of Nebraska-Lincoln. “The ribosome tries to avoid quadruplet codon decoding to avoid that frame shift, so efficiency will be low. You need to further optimize the entire translation system so that the quadruplet codon can be decoded.”
The low efficiency of quadruplet codon translation has led many scientists working in the genetic code expansion field to doubt the utility of the system. For example, compared to the grams of protein per liter that can be made by replacing a stop codon with a new amino acid, quadruplet codons can’t produce proteins near that scale yet.
“You can talk with people now, and they'll say if you work with quadruplet codons, this is really not very useful because you're only going to do what nature discovered long ago,” said Dieter Söll, a synthetic biologist and expert in genetic code expansion at Yale University. But, he added, “it is clear that in vitro and in limited in vivo experiments, quadruplet codons work.”
A quadruplet codon-based HIV vaccine
One exciting way that quadruplet codons may influence human health is by controlling the replication of the human immunodeficiency virus (HIV) through a safe and effective live virus-based vaccine.
“Live, attenuated HIV can provide fairly good protection in animal studies,” said Guo. “Although it's attenuated, you still have live virus, so there are lots of safety concerns.” While a live, weakened virus vaccine promoted a strong immune response in animals, researchers noticed that sometimes the attenuated virus reverted to wildtype, leading the animals to develop AIDS (4).
Guo and his team tried inserting amber stop codons (UAG) into essential genes in the HIV genome, along with corresponding tRNAs to insert noncanonical amino acids. When the researchers provided the virus with the noncanonical amino acid, it replicated well in host cells, but once the supply of the foreign amino acid ran out, HIV could no longer replicate (5).
“The hope is that with a few cycles of infection and regeneration, your immune system can generate a response to this HIV. But after that, HIV cannot replicate anymore,” Guo explained.
HIV, however, has a high mutation rate, and Guo’s team worried that the virus might mutate the amber stop codon back into a codon that encodes a canonical amino acid, leading to infection of the host with HIV. Instead, they used quadruplet codons to encode the noncanonical amino acid, reducing the risk of the virus infecting its host even if it did mutate the codon (6).
“For quadruplet codons, even if there are mutations in those single four nucleotides, it's still a quadruplet codon. That means HIV still cannot replicate even with any mutations. That actually gives better control and lowers the escape rate of HIV from nonfunctional to functional virus,” Guo explained.
Guo and his team are engineering the HIV genome for better control over its replication and efficient replication of the virus with the specific quadruplet codons. They hope to get the virus to “replicate as easy as wildtype, so that you can generate enough immune response,” Guo said.
Worms with quadruplet codons glow with potential
Scientists are also using quadruplet codons to investigate basic biological processes in complex multicellular organisms. In a new study, Sebastian Greiss, a neurobiologist at the University of Edinburgh, and his team demonstrated that quadruplet codons could be translated in a multicellular organism, C. elegans, for the first time (7).
“The genetic code is kind of like the most stable thing in the solar system,” Greiss said. “You can actually go in and introduce a quadruplet in an animal, and it still kind of works.”
Greiss had previously used amber stop codons to introduce noncanonical amino acids into C. elegans (8), but he wondered if he could also use quadruplet codons, which would give him the opportunity to introduce more than just one or two noncanonical amino acids into the same protein.
Basing their system off the amber codon pyrrolysine system from methanogenic archaea, Greiss and his team added a nucleotide to the anticodon of the tRNA so that it would recognize the quadruplet codon UAGA instead of UAG.
The result, though, was less than optimal. So, the researchers looked through the literature and optimized the anticodon loops with nucleotides that should be more favorable for quadruplet codon translation. It helped a little, but not much.
Then, using a tRNA scaffold optimized for eukaryotic cells paired with the improved anticodon loop, “all of a sudden, it just worked amazingly well,” Greiss said.
Their colorless worms suddenly glowed bright red due to the fluorescent reporter, mCherry, which was only expressed upon successful translation of the quadruplet codon UAGA. “They didn't look like the quadruplet worms that we had worked with before. It was really at a level that we only saw with the old triplet system,” he added.
The team expressed a light-activated Cre recombinase in specific cell types, specifically the C. elegans glutaminergic neurons, via the quadruplet codon system. They also used their quadruplet codon system to add a light-activated cysteine residue into the sequence of a caspase protein. This allowed the researchers to activate the cysteine with light, activating caspase activity in a specific cell. With this system, cell biologists can selectively kill whichever cell expresses the caspase.
“It shows that quadruplet codons are definitely feasible to actually use for genetic code expansions,” Greiss said. “We're not just playing around sort of doing party tricks for our own benefit, but it's actually developing into tools that people really are using to do research.”
But can we really make a quadruplet genetic code?
Even with the exciting potential quadruplet codons hold for therapeutics and new research tools, quadruplet codon translation is still a pretty inefficient process. If scientists are serious about creating proteins and even whole organisms that use a quadruplet code, they’re going to need to make some improvements. In a recent Nature Communications paper, DeBenedictis and Badran did exactly that (9).
“The question was more about why does this work so poorly, and how can we improve it?” said Badran, the senior author on the paper.
Rather than trying to engineer a quadruplet codon to encode an unnatural amino acid, Badran’s team stuck with the canonical 20 amino acids. Because the aminoacyl tRNA synthetases for canonical amino acids are much more efficient at putting the appropriate amino acid on the right tRNA molecule, the researchers figured that using canonical amino acids would give them the highest likelihood of increasing translation efficiency.
To find tRNAs that would more efficiently decode quadruplet codons, the researchers took an unbiased, directed evolution approach. “Historically, tRNA engineering efforts focused on dedicated segments within the tRNA that are potentially known to have a certain function,” Badran said. But, DeBenedictis added, “no one had ever tried just mutagenizing the entire tRNA to see what helps.”
Using a technique called phage-assisted continuous evolution (PACE), the researchers evolved tRNAs that increased quadruplet codon translation efficiency by 80-fold. The team’s new tRNAs led to similar levels of translation as the commonly used triplet-decoding, tRNA-aminoacyl tRNA synthetase pair from the methanogenic archaea, Methanocaldococcus jannaschii (10).
“The mutations that arose tended to be in particular spots, not at the anticodon, but right next to it at the sides of the anticodon loop, which was really interesting. People who had done this before by hand had included that area, but also included other stuff,” DeBenedictis explained. “It was really cool to have this unbiased approach for being like, actually, we only need to worry about this little area of the tRNA in general.”
The team then found that E. coli could translate four of their quadruplet codons in addition to translating regular triplet codons in a single protein. This experiment was the first time four different quadruplet codons had been translated together in a single cell.
“Now we have the ability to explore other quadruplet decoding tRNAs and how we can improve them,” Badran said. “A lot of different elements and applications start to open themselves up where we can improve on potential therapeutics that might already exist by incorporating non canonical amino acids that we have chosen for a specific function.”
While these results are exciting, DeBenedictis wondered how hard it would be to actually make a fully functioning quadruplet genetic code in an organism like E. coli. In her recent preprint, she reported that engineering tRNAs to recognize quadruplet codons was surprisingly easy, and nine out of the cell’s 20 aminoacyl tRNA synthetases could still recognize and charge the quadruplet tRNAs with the appropriate amino acids (11).
To get the remaining aminoacyl tRNA synthetases to recognize and charge the rest of the quadruplet tRNAs with amino acids, “you would have to do some reasonably involved protein engineering,” she said, but “it's on a finite number of proteins. It's an engineering task that is ten times bigger than a normal protein engineering task, but only ten times.”
Both DeBenedictus’ preprint and Nature Communications paper lay out the basic science for implementing a quadruplet-based genetic code.
“It's the sort of thing that we could implement in ten years. All of these technologies are in progress and eminently doable if this is a thing we want,” she said. “It's taking something that people have been dreaming about for a long time and actually taking it seriously. I mean, we could do this.”
Quadruplet codon-based therapeutics
A quadruplet-based genetic code may not be ready for therapeutic applications quite yet, but the future looks promising. “One of the tricky things about genetic code expansion is it's one of these sky's the limit areas,” said DeBenedictis. “I think synthetic biology is correctly criticized sometimes as being a hammer looking for a nail.”
DeBenedictis expects tRNAs with quadruplet codons to be useful as therapeutics for treating diseases caused by premature stop codons. A quadruplet tRNA could suppress that stop codon, and its extra base pair in the anticodon loop could make it even more site-specific than a triplet codon would.
Guo and other researchers studying genetic code expansion systems are interested in using a quadruplet-based genetic code to create completely unnatural polymer-like proteins or polypeptides with functions that don’t exist in nature.
“Can we generate an enzyme with very high catalytic efficiency, or maybe resistant to degradation?” he asked. Maybe, he added, “you can catalyze a reaction that cannot be catalyzed by any naturally occurring enzyme.”
The genetic code expansion field and use of a quadruplet codon-based genetic code in particular will likely see many exciting advances in the next decade or so. What those are, though, remain to be determined.
“Genetic code expansion today, we do it for the same reason we do space travel, which is not because it's immediately useful, but rather, because maybe it will be useful in 50 years,” DeBenedictis said. “It's inspirational.”
- Noren, C.J. et al. A General Method for Site-Specific Incorporation of Unnatural Amino Acids into Proteins. Science 244, 182-188 (1989).
- Hao, B. et al. A New UAG-Encoded Residue in the Structure of a Methanogen Methyltransferase. Science 296, 1462-1466 (2002).
- Bossi, L. & Roth, J.R. Four-base codons ACCA, ACCU and ACCC are recognized by frameshift suppressor sufJ. Cell 25, 489-496 (1981).
- Baba, T. et al. Live attenuated, multiply deleted simian immunodeficiency virus causes AIDS in infant and adult macaques. Nat Med 5, 194-203 (1999).
- Yuan, Z. et al. Controlling Multicycle Replication of Live-Attenuated HIV-1 Using an Unnatural Genetic Switch. ACS Synth. Biol. 6, 721-731 (2017).
- Chen, Y. et al. Controlling the Replication of a Genomically Recoded HIV-1 with a Functional Quadruplet Codon in Mammalian Cells. ACS Synth. Biol. 7, 1612-1617 (2018).
- Xi, Z. et al. Using a Quadruplet Codon to Expand the Genetic Code of an Animal. Nucleic Acids Research gkab1168 (2021).
- Greiss, S. & Chin, J.W. Expanding the Genetic Code of an Animal. J Am Chem Soc 133, 14196-14199 (2011).
- DeBenedictis, E.A. et al. Multiplex suppression of four quadruplet codons via tRNA directed evolution. Nat Commun 12, 5706 (2021).
- Wang, L. et al. A New Functional Suppressor tRNA/Aminoacyl−tRNA Synthetase Pair for the in Vivo Incorporation of Unnatural Amino Acids into Proteins. J Am Chem Soc 122, 5010-5011 (2000).
- DeBenedictis, E. et al. Measuring the tolerance of the genetic code to altered codon size. Preprint at: https://www.biorxiv.org/content/10.1101/2021.04.26.441066v1.full