Genetically Encoded Fluorophenylalanines Enable Insights into the Recognition of Lysine Trimethylation by an Epigenetic Reader

, Abstract Fluorophenylalanines bearing 2–5 fluorine atoms at the phenyl ring have been genetically encoded by amber codon. Replacement of F59, a phenylalanine residue that is directly involved in interactions with trimethylated K9 of histone H3, in the Mpp8 chromodomain recombinantly with fluorophenylalanines significantly impairs the binding to a K9-trimethylated H3 peptide. Due to the size similarity between hydrogen and fluorine atoms, most fluorinated amino acids closely resemble their canonical counterparts. When provided in nutrients, they are usually mistaken as canonical amino acids by the cellular translation system and integrated into proteins at corresponding amino acid sites, therefore leading to mild to severe cellular toxicities. 1 Biochemists have long been exploiting this promiscuity of the cellular translation system to generate fluorinated proteins. 2 Owing to its high NMR signal that is sensitive toward surrounding environments, fluorine in proteins provides a unique probe to study protein structure and dynamics. 3 Fluorine also has a more hydrophobic nature than hydrogen, which endows fluorinated proteins with unique features such as high resistance to denaturants. 4 Although convenient in making

at user-defined sites are of high interest.A successful strategy at this front is the use of amber suppression that in the context of fluorinated amino acids was pioneered by Furter and later expanded by others. 5 previously reported two rationally designed, polyspecific mutants of Methanosarcina mazei pyrrolysyl-tRNA synthetase (PylRS) that enable the effective aminoacylation of tRNA Pyl with a large variety of phenylalanine-derived ncAA for their incorporation into proteins by amber suppression.Of these, mutant N346A/C348A/Y306A/Y384F (PylRS-AAAF) accepted phenylalanine derivatives with large substiutents at the para position as substrates, 6 whereas N346A/C348A (PylRS-AA) accepted small phenylalanine derivatives, 7 including the NMR probe m-trifluoromethyl-phenylalanine.
To get insights into the origin of polyspecificity of PylRS-AA and into its lack of phenylalanine recognition, we determined the crystal structure of the C-terminal catalytic fragment (amino acids 188-454) 8 in complex with the ATP analog adenosine-5′-(β,γimido)triphosphate (AMPPNP) at a resolution of 1.5Å (the structure of PylRS-AAAF was also solved, for data collection and refinement statistics, see the SI). 9 In wild type PylRS (PylRS_wt), N346 of the amino acid binding pocket serves as gate-keeper residue that is engaged in a variety of direct and water-mediated hydrogen bonds (Figure 1A).This includes donation of one bond to the Nε-carbonyl group of pyrrolysine adenylate, which is believed to be critical for pyrrolysine recognition. 10C348 forms part of the pocket bottom (Figure 1A).In the structure of PylRS-AA, the mutation of both residues to alanine results in an inability of hydrogen bonding, combined with an enlargement of the pocket, in particular at position 346 (Figure 1B).A related observation has been made in the context of a different mutant that accepts O-methyl-tyrosine as substrate. 11lRS belongs to the aminoacyl-tRNA-synthetase subclass IIc that also includes PheRS from that PylRS has directly evolved, and both show similarity in their overall fold and in the organization of the core domain (Figure 1C). 9 Our structure of PylRS-AA reveals similarities with PheRS in the front pocket dimension and polarity, despite marked differences in the type and orientation of the involved residues (Figure 1D, E).However, the overall pocket dimensions of PylRS-AA (in particular in the rear part) are larger, and in superimposed structures, both the Phe ligand and the pocket surface of PheRS can easily be accommodated in the pocket of PylRS-AA (Figure 1E).Specifically, in PheRS, residues including E210, F248, F250 and A294 form a very compact binding pocket for phenylalanine (Figure 1D).Consequently, though PheRS tolerates o, m, and pfluorophenylalanines as substrates, larger derivatives are usually expelled.12 Residues in PylRS-AA that correspond with E210 and A294 of PheRS are A346 and G419 13 that bear no or shorter side chains.L305 in PylRS-AA is a homologous site of E174 in PheRS, but its side chain diverts from the pocket (not shown).Finally, F248 of PheRS has no homologous residue in PylRS-AA although Y384 of PylRS-AA partially occupies its space (Figure 1D,  E).Taken together, the arrangement of active site residues of PylRS-AA leads to an enlarged amino acid binding pocket, and the differential availability of hydrophobic contacts for larger, substituted Phe-derivatives versus unsubstituted Phe may account for the observed selectivity of PylRS_AA (SI Figure 5).This structural comparison intrigued us to test the recognition of pentafluoro-phenylalanine (F 5 F, Figure 2A) by PylRS-AA. A expected, PylRS-AA does recognize F 5 F. E. coli BL21 cells coding PylRS-AA, tRNA Pyl , and sfGFP2TAG (superfolder green fluorescent protein (sfGFP) with an amber mutation at its S2 position) expressed full-length sfGFP when F 5 F was provided in the GMML medium, albeit with a low level (Supplementary Figure 1).In order to identify a PylRS mutant that in coordination with tRNA Pyl shows an enhanced amber suppression rate in E. coli for more efficient incorporation of F 5 F into proteins, we constructed a small PylRS-AA-based mutant library by randomizing A348.A348 is spatially close to E174 in PheRS that locks phenylalanine restrictedly at the PheRS active site.By randomizing A348, we deemed that a better mutant with tighter binding of F 5 F could be identified.Screening all mutants led to the final identification of the mutant with S348 (coined as PylRS-AS) that in coordination with tRNA Pyl provided a higher efficiency of amber suppression in E. coli in the presence of F 5 F (Supplementary Figures 2 and 3).
To test the fidelity of PylRS-AS for the genetic incorporation of F 5 F in response to the amber codon, E. coli BL21 cells coding for PylRS-AS, tRNA Pyl and sfGFP2TAG were grown in GMML medium with or without supplementing F 5 F. Cells grown in the presence of F 5 F produced full-length sfGFP (sfGFP-F 5 F) with an expression level of 10 mg/L, markedly contrasting to a negligible expression of full-length sfGFP in the absence of F 5 F (Figure 2B).This demonstrated that PylRS-AS accepts F 5 F as substrate but discriminates against canonical amino acids including Phe.Electrospray ionization mass spectrometry (ESI-MS) analysis of the purified sfGFP-F 5 F displayed a molecular weight of 27817 Da that agreed well with the theoretical mass at 27819 Da.The single dominant ESI-MS peak also indicated that F 5 F was not recognized by E. coli PheRS, which would lead to replacement of 12 Phe residues in sfGFP during translation.Therefore the genetic encoding of F 5 F by amber codon is orthogonal to the endogenous translation system.
We next tested the ability of PylRS-AS for the acceptance of other fluorophenylalanines including 2,3,4,5-tetrafluorophenylalaine (F 4 F), 14 3,4,5-trifluorophenylalanine (F 3 F), 3,5difluorophenylalanine (F 2 F), and 3,4-difluorophenylalanine (F 2 F').When these fluorophenylalanines were present in the growth medium, E. coli BL21 cells coding for PylRS-AS, tRNA Pyl and sfGFP2TAG expressed full-length sfGFP (Figure 2B).Expression levels under these conditions were similar to the condition with F 5 F. Molecular weights of purified full-length sfGFP proteins expressed in presence of F 4 F, F 3 F, and F 2 F (sfGFP-F 4 F, sfGFP-F 3 F, and sfGFP-F 2 F, respectively) determined by ESI-MS agreed well with theoretical molecular weights of these proteins (Figure 2C and Table 1).All three proteins exhibited a single dominant ESI-MS peak, establishing the orthogonality of genetic ncAA incorporation in respect to the endogenous translation system.However, the full-length sfGFP with F 2 F' incorporated (sfGFP-F 2 F') displayed multiple peaks in its ESI-MS spectrum.The smallest peak at 27765 Da matched the theoretic mass at 27763 Da.However, other peaks were all about multiples of 36 Da addition to the theoretical mass, clearly indicating that F 2 F' displaced regular phenylalanine residues in sfGFP.This result demonstrated that the genetic encoding of F 2 F' by amber codon is not orthogonal to the endogenous translation system, although PylRS-AS does recognize it as a substrate.
In addition to being used as a NMR probe and to improve protein folding, genetically encoded fluorophenylalanines in proteins could potentially be for the investigation of cationpi interactions such as in the recognition of lysine methylation in histones by epigenetic readers.Being part of epigenetic regulation of chromatin function, histone lysine methylation induces interactions with effector proteins and subsequently regulates DNA replication, repair, and transcription. 15The recognition of methylated lysine typically involves an aromatic cage that has been found in the chromodomain (Figure 3A), the PHD finger, and the Tudor domain, and appears to be mediated by cation-pi interactions between the methylammonium moiety and aromatic residues in the cage. 16The cation-pi interaction is predominantly electrostatic, occuring between a cation and the quadruple moment of an aromatic π system (Figure 3B). 17As the quadruple moment places partial negative charge above each face of the aromatic ring, favorable interactions with a cation occur perpendicular to the aromatic plane within a typical van der Waals distance.Although a number of theoretical and experimental studies have been carried out to address the importance of the cation-pi interaction in the recognition of lysine methylation, 18 it is not clear to what degree the cation-pi interaction contributes to the recognition specificity.A particularly interesting target protein to address this question is the Mpp8 chromodomain (Mpp8C).Mpp8 is a heterochromatin component that specifically recognizes and binds trimethylated K9 of histone H3 and promotes recruitment of proteins that mediate epigenetic repression. 19In Mpp8C, F59 is part of the aromatic cage that directly binds to trimethylated K9 of H3.Replacing this residue with fluorophenylalanines (in particular with F 5 F that has a strongly reduced partial negative charge above each face of the aromatic side chain) is expected to significantly reduce the binding of Mpp8C to trimethylated K9 of H3 in the case that the cation-pi interaction plays a dominant role.Otherwise, binding would not be strongly affected or might increase due to the more hydrophobic nature of fluorophenylalanines than phenylalanine.
Using our currently developed approach, Mpp8C with F59 replaced by the three derivatives F 5 F, F 3 F, and F 2 F were expressed.The incorporation of F5F in Mpp8C was independently confirmed with the detection of three 19 F NMR singals in the finally purified protein (SI Figure 6).Together with wild type Mpp8C, interactions of these proteins with a fluoresceinconjugated N-terminal histone H3 peptide with trimethylation at the K9 position (FAM-H3(1-15)K9me3) were studied using fluorescent polarization changes.As shown in Figure 3C and Table 2, wild type Mpp8C interacts with FAM-H3(1-15)K9me3 strongly, with a determined Kd value around 0.8 µM that agrees with previously reported values. 20This binding was decreased 15-fold when F59 was replaced with F 2 F and continued to drop when F59 was replaced with F 3 F and F 5 F (Figures 3D-F and Table 2).Due to the low binding of FAM-H3(1-15)K9me3 to both F59F 3 F and F59F 5 F mutants of Mpp8C, no sufficient data could be collected to determine accurate Kd values between these two proteins and FAM-H3(1-15)K9me3.This continuous decrease of binding of Mpp8C to FAM-H3(1-15)K9me3 when a growing number of fluorine substituents are added to F59 strongly suggests that the cation-pi interaction plays a dominant role in the binding of trimethylated K9 of H3 to Mpp8C.Though hydrophobic interactions may contribute to the binding, they appear to be not significant, since adding hydrophobicity to F59 does not improve binding.
In summary, a method for the genetic incorporation of fluorophenylalanines with fluorine substituents at the side chain phenyl ring ranging from 2 to 5 has been developed.This was based on a polyspecific PylRS mutant, its crystal structural analysis, and its further reengineering.The engineered PylRS mutants display recognition of fluorophenylalanines and discriminate against canonical amino acids including phenylalanine, assuring their specific incorporation in response to the amber codon.Using this method, we synthesized Mpp8C, a chromodomain with fluorophenylalanines replacing the critical active site residue F59 that directly interacts with trimethylated K9 of H3 for its binding to Mpp8C.We showed that replacing F59 with fluorophenylalanines significantly weakens the binding of Mpp8C to trimethylated K9 of H3.This result strongly supports a critical involvement of the cation-pi interaction in the recognition of lysine trimethylation by a chromodomain.

Supplementary Material
Refer to Web version on PubMed Central for supplementary material.

Figure 1 .
Figure 1.Crystallographic analysis of PylRS-AA(A) Amino acid binding pocket of PylRS-wt in complex with a pyrrolysyl-adenylate in a previously reported crystal structure (PDB entry: 2Q7H).Hydrogen bonds are shown as dotted yellow lines, water molecule as red sphere.Pocket surface is drawn transparent and colored according to atoms that form the surface.(B) Amino acid binding pocket of PylRS-AA in complex with AMPPNP (PDB entry: 5KIP).Surface color code as in Fig.1A.(C) Overview of superimposed crystal structures of E. coli PheRS (light blue) in complex with phenylalanine (Phe) and AMP (pdb entry 3PCO) and PylRS_AA (grey) in complex with

Figure 2 .
Figure 2. The genetic incorporation of fluorophenylalanines (A) Structures of five fluorophenylalanines. (B) The expression of sfGFP with fluorophenylalanines incorporated at its S2 position.To express full-length sfGFP, E. coli BL21 cells were transformed with two plasmids coding PylRS-AS, tRNA Pyl , and sfGFP2TAG and the transformed cells were grown in the GMML medium supplemented with or without a fluorophenylalanine at 3 mM.(C) Deconvulated ESI-MS spectra of purified full-length sfGFP proteins.Theoretical molecular weights are 27819, 27801, 27783,

Table 1
Theoretical and detected molecular weights of different full-length sfGFP proteins.