Clark A. Jonesab,
Chelsea A. Makovskyab,
Aidan K. Haneya,
Alba C. Dutraab,
Clinton A. L. McFeelyab,
T. Ashton Croppa and
Matthew C. T. Hartman
*ab
aDepartment of Chemistry, Virginia Commonwealth University, Box 842006, 1001 W. Main St, Richmond, 23284-2006, VA, USA. E-mail: mchartman@vcu.edu
bMassey Comprehensive Cancer Center, Virginia Commonwealth University, Box 980037, 401 College St, Richmond, 23298-0037, VA, USA
First published on 22nd April 2025
Expanding the genetic code affords exciting opportunities for synthetic biology, studies of protein function, and creation of diverse peptide libraries by mRNA display. Maximal expansion with the standard 64 codon code requires breaking the degeneracy of the 61 sense codons which encode for only 20 amino acids. In E. coli these 61 codons are decoded by 46 different tRNAs. Moreover, many codons are decoded by multiple tRNAs, further complicating efforts to break this redundancy. The overlapping decoding patterns of the 11 tRNAs in E. coli which read the 16 codons that encode serine, proline, threonine, and alanine codons exemplify this difficulty. Here we tackle this challenge by first outlining a general process to evaluate codons for their potential for reassignment. We then use this knowledge to assign these 16 codons to 10 different amino acids, more than doubling their encoding potential. Our work highlights the expanded potential of sense codon reassignment and points the way to a dramatically expanded code containing more than 30 monomers.
The redundancy of the genetic code is controlled by the aminoacyl-tRNA synthetases (AARS) and their specificity for tRNAs as well as the ribosome's ability to select the tRNAs that correspond to their cognate codons. To break this degeneracy first requires loading the tRNAs that read a given codon box with additional ncAAs. Mature technologies for charging ncAAs onto unique tRNAs have been developed in vitro and in vivo,14–23 leading to modest genetic code expansion.2,12,24,25 However, a remaining key roadblock for sense codon reassignment is the overlapping codon readings of tRNA isoacceptors.26–31 This overlap limits discrete reassignment because tRNAs bearing different AAs will compete for the same codon. Attempts to break the degeneracy of the 4-fold degenerate codon boxes epitomize the challenges of overlapping reading. To date, no one has been able to break the degeneracy of these codon boxes more than 2-fold.2
Most of the work in this field has been carried out under the assumption that codons read by more than one tRNA cannot be discretely reassigned. However, overlapping reading does not mean that each tRNA reads the codon with equal efficiency. To understand the competitive nature of this codon reading, we recently developed an isotopic competition assay where each tRNA isoacceptor decoding codons in a given codon box is charged with an isotopically labeled canonical amino acid of distinct mass. The decoding percentage of each tRNA at each codon is assessed by measuring at the isotopic distribution of the resulting peptides via mass spectrometry.32,33
Guided by this assay, we showed that by using in vitro transcribed tRNAs and hyperaccurate ribosomes we were able to split the six leucine codons to encode for five unique amino acids.32–34 It is not, however, clear that such a dramatic genetic code expansion will extend beyond the leucine codons. Here we focus on investigating the fourfold degenerate NCN codon boxes that encode for serine (UCN), proline (CCN), threonine (ACN), and alanine (GCN). Previous attempts at sense codon reassignment have led to the splitting of both the UCN and ACN codon boxes to encode two amino acids each.12 To our knowledge, no one has been able to break the degeneracy of the proline (CCN) or alanine (GCN) codons. By comparison, here we show that these 16 codons can encode for 10 unique amino acids, demonstrating that the genetic code as a whole is much more pliable than previously thought. Our work paves the way for the future creation of peptide libraries and organisms with a dramatically expanded genetic code.
We aminoacylated each of the three serine tRNAs purported to read the UCN codons (Ser2CGA, Ser1UGA, and Ser5GGA) with different isotopically labeled serine analogs of unique mass (serine, serine-d3, and serine-d3-13C3–15N1, respectively). We mixed these charged tRNAs together, and used this mixture in separate in vitro translation assays with four different mRNAs, each containing one of the UCN codons (Fig. 1A and S1†). These translations lacked SerRS, but contained the other enzymes necessary to create the final peptides. The percentages of incorporation of each isotope were determined by comparison of the intensity of the peptides of varying isotopic mass (Fig. 1B). Since the AAs are isotopologues, they should ionize identically in MS and also be recognized equally by the translation machinery. The UCA, UCC, and UCG codons all showed >80% selectivity for a single tRNA. The UCU mRNA was read by two different tRNAs, Ser5GGA and Ser1UGA. These results were converted to the heat map shown in Fig. 1C.
![]() | ||
Fig. 1 Breaking the degeneracy of the UCN serine codon box. (A) mRNA templates used for the in vitro translation testing the codon readthrough of each tRNA. Targeted codons highlighted in blue. mRNA 2 included a different histidine codon (CUC) which improved the yield compared to the CAU containing mRNA as predicted by Kim and Jung.38 (B) Representative mass spectra showing different isotopic incorporations on serine codons. Translations incubated at 37 °C for 30 min with 5 μM serine-Ser2CGA(L-serine), serine-d3-Ser1UGA and serine-d3-13C3–15N1-Ser5GGA. All MS data used for these experiments are shown in Fig. S1.† Calculated masses for the peptides are: Ser (Ser2CGA, 1728.76); d3-ser (Ser1UGA, 1731.78), and d3-13C3–15N1-Ser (Ser5GGA, 1735.80). (C) Heat map demonstrating the observed incorporation percentages for each tRNA with their respective mRNA codons. The isotopically labeled serines are shown above their respective tRNAs. * indicates isotopically labeled 13C and 15N. Error bars represent the mean of 3 experiments. (D) Representative MALDI spectra showing incorporation of ncAAs/AAs on their respective mRNA codons. Each tRNA was added at 15 μM concentration and the translations were carried out for 30 min. Additional replicates are shown in Fig. S5.† |
Using this heatmap as a guideline, we charged each tRNA with a unique amino acid using enzymatic charging or flexizyme (Fig. 1D, S2 and S3†). Ser1UGA was charged with O-methyl serine (Ser(Me)),11 Ser2CGA with serine, and Ser5CGA with allylglycine (AllylG), and these tRNAs were used in in vitro translations with each of the mRNAs. When each was used at 15 μM concentration, the UCA, UCC, and UCG codons were predominantly read by the expected tRNA, in line with our isotopic competition experiment. Also, as expected, the UCU mRNA gave a mixture of peaks, due to both Ser1UGA and Ser5GGA decoding with Ser5GGA as the major peak (Fig. S4†). Increasing the concentration of Ser5GGA from 15 μM to 30 μM significantly decreased the incorporation of the Ser1UGA tRNA on the UCU codon and enabled predominant reading of all four serine codons by a single tRNA species (Fig. 1D and S5†).
![]() | ||
Fig. 2 Breaking the degeneracy of the CCN proline codon box. (A) mRNA templates used for the in vitro translation testing the codon readthrough of each tRNA. Targeted codons highlighted in red. mRNA 4 included a different histidine codon (CAC) and phenylalanine codon (UUC) which improved the yield compared to the CAC/UUC containing mRNA as predicted by Kim and Jung.38 (B) Heat map demonstrating the observed incorporation percentages for each tRNA with their respective mRNA codons. Error bars represent the mean of 3 experiments. All MS data used for this experiment are shown in Fig. S6.† (C) Representative MALDI spectra showing incorporation of ncAAs/AAs on their respective mRNA codons. Each tRNA was added at 15 μM concentration and the translations were carried out for 30 min. Additional replicates are shown in Fig. S7.† *An unknown +3 Da peak that results from a contaminant in the O-Me glutamic acid dinitrobenzyl ester amino acid flexizyme substrate. |
The CCA, CCC, and CCG codons all showed >78% selectivity for a single tRNA (Fig. 2B and S6†). The CCU mRNA was read by both Pro2GGG and Pro3UGG. The results were converted into a heatmap (Fig. 2B). We proceeded with sense codon reassignment using the same protocol previously mentioned. For this experiment we aminoacylated Pro1CGG with 1-aminocyclopropane-1-carboxylic acid (Acp),39 Pro2GGG with pipecolic acid (Pip),40 and Pro3UGG with glutamic acid γ-methyl ester (Glu(Me))25 (Fig. S3†).
Each of the 4 codons was predominantly read by a single tRNA (Fig. 2C and S7†), in line with our isotopic competition experiment. In contrast to our isotopic competition experiment, however, no co-reading of the CCU codon by Pro3UGG was observed, with the CCU codon being primarily read by Pro2GGG alone. In order to test the impact of the ncAA structure on codon reading we repeated this experiment with a different set of ncAAs (Fig. S6†). These experiments also showed excellent orthogonality, showing that the codon reading pattern is not dependent on the ncAA attached.
![]() | ||
Fig. 3 Breaking the degeneracy of the ACN threonine codon box. (A) mRNA templates used for the in vitro translation testing the codon readthrough of each tRNA. Targeted codons highlighted in green. (B) Heat map demonstrating the observed incorporation percentages for each tRNA with their respective mRNA codons. * indicates isotopically labeled 13C and 15N. Error bars represent the mean of 3 experiments. All MS data used for this experiment are shown in Fig. S9.† (C) Representative MALDI spectra showing incorporation of ncAAs/AAs on their respective mRNA codons. Each tRNA was added at 15 μM concentration and the translations were carried out for 30 min. Additional replicates are shown in Fig. S10.† *Observed misincorporation of methionine (expected: 1773.68, observed: 1774.08). |
The competitive codon-reading data (Fig. S9†) were converted into a heatmap (Fig. 3B). The ACA and ACG codons both showed >90% selectivity for a single tRNA. The ACC codon was read predominantly by Thr1GGU (82%) but was also read at lower percentages by Thr2CGU and Thr4UGU. The ACU codon was decoded by Thr1GGU (71%), as well as Thr2CGU (10%) and Thr4UGU (19%) (Fig. S9†).
We proceeded with sense codon reassignment. For this experiment we aminoacylated Thr1GGU with L-2-aminobutyric acid (Abu), Thr2CGU with cyclopentyl glycine (CPG) and Thr4UGU with propargyl glycine (PropG) (Fig. S3†). Each of the four threonine codons were predominantly read by a single tRNA. This selectivity of codons for their respective tRNAs was observed with non-canonical amino acids as well (Fig. 3D and S10†).
![]() | ||
Fig. 4 Attempted breaking of the degeneracy of the alanine codon box. Heat map demonstrating the observed incorporation percentages for both alanine tRNAs with their respective mRNA codons. * indicates isotopically labeled 13C and 15N. Error bars represent the mean of 3 experiments. All MS data used for this experiment are shown in Fig. S11.† |
![]() | ||
Fig. 5 Assessing translational fidelity with multiple incorporations of ncAAs in a single mRNA. (A) Serine codons: UCA, UCC, and UCG. (B) Proline codons: CCC, CCG and CCA. (C) Threonine codons: ACA, ACC, and ACG. Representative MALDI spectra are shown for each experiment. Additional replicates are shown in Fig. S12.† tRNA-ncAA pairings are identical to those in Fig. 1–3. The Ser and Thr AA-tRNAs were added at 15 μM concentration, and the Pro AA-tRNAs were added at 30 μM. The translations were carried out for 60 min at 37 °C. |
In this work, through the use of flexizyme-charged in vitro transcribed tRNAs in conjunction with hyperaccurate ribosomes, the UCN, CCN, and ACN (serine, proline, and threonine) codon boxes were reassigned to encode for three different ncAAs/AAs each, thus tripling the encoding potential for all three codon boxes (Fig. 6A). The tRNA readings for these three boxes share a common pattern (Fig. 6B). For the third codon position Watson Crick base pairs are preferred for G:C (codon:tRNA), A:U, and C:G. The codon ending in U is also paired with a G, forming a wobble pair.45,46 The clean orthogonality is in contrast to expectations with these codon boxes, which show a web of overlapping tRNA reading2,26,27 in particular from the tRNAs that have a U in the anticodon (Fig. 6 B). The improved orthogonality likely results from our use of hyperaccurate ribosomes.34
![]() | ||
Fig. 6 Re-encoding the NCN codons. (A) Codon chart demonstrating the reassignment of the UCN, CCN, and ACN codon boxes. On the left is the natural codon chart and on the right is our maximized codon chart. (B) Change in tRNA isoacceptor reading. Top is the expected codon reading pattern26 and bottom is our maximized codon readthrough pattern. |
In the case of the alanine and proline codons, the presence of two G:C pairs at the first and second codon positions makes these “strong” codons as defined by Grosjean and Westhof.47 It is interesting that it is easier to divide the proline (CCN) codon box (Fig. 2B) than the alanine (GCN) codon box (Fig. 4). The alanine GCN codon box is unique among the NCN codon boxes in E. coli because it is decoded by 2, rather than 3 tRNAs, although other bacteria do have 3 tRNAs that read these codons.48E. coli also has only two tRNAs that read the valine GUN codon box, and it was also difficult to reassign this codon box to encode for three amino acids.33 Conversely, E. coli uses 5 tRNAs to read the 6 leucine codons, and it was possible to extensively break the degeneracy of these codons.33 Taken together, these data suggest that the number of tRNAs used by an organism to decode a specific codon box may be a useful feature to predict the ability to break the degeneracy of those codons.
Presumably, the rules for breaking of the NCN sense codons should also be applicable in living E. coli that are engineered to have hyperaccurate ribosomes. To do so will first require recoding the genome of E. coli to eliminate the extra codons and the native tRNAs that read them. Recent progress in genome synthesis49 should enable such a strategy. Indeed, the first examples of sense codon liberation have been described.50,51 These codons can then be re-encoded using an orthogonal synthetase/tRNA pair.51 There are a growing number of such pairs under development.52,53
Although there are many postulates for how the genetic code evolved into its current form, there are strong reasons to believe that alanine and proline were among the original entrants into the code (along with glycine and a positively charged amino acid).54,55 Our isotopic competition assay and in vitro translation system could be useful to researchers testing the potential basis for such a minimal code.
In addition to establishing codon orthogonality, this translation system successfully encoded for multiple ncAA monomers with unique functional groups not present in the canonical amino acids (Fig. S2†). Acp44 is an α,α-disubstituted amino acid that exerts unique conformational constraints when incorporated into a peptide.56 PropG, AllylG, and Ser(Me) introduce alkynes, alkenes, and ethers into the genetic code. PropG's alkyne in particular can permit the introduction of click-chemistry reactions into peptides. Similarly, Glu(Me) provides a side chain ester that could be amenable to a variety of chemical transformations.57 Additionally, to our knowledge, both AllylG and CPG have never been used in translation previously.
Footnote |
† Electronic supplementary information (ESI) available: Supplementary figures (S1–S12) and supplementary tables (S1 and S2) as well as synthetic details and characterization. See DOI: https://doi.org/10.1039/d4sc06740a |
This journal is © The Royal Society of Chemistry 2025 |