Comparative effects of trifluoromethyl- and methyl-group substitutions in proline

Proline is one of a kind. This amino acid exhibits a variety of unique functions in biological contexts, which continue to be discovered and developed. In addition to the reactivity of the primary functional groups, the trans–cis isomerization of the peptidyl–prolyl amide bond and its impact on the protein structure and function are of major interest. A variety of proline ring substitutions occur in nature, and more substitutions have been generated via chemical synthesis. Particularly promising is the F-labelling of proline, which offers a relatively new research application area. For example, it circumvents the lack of common NH-NMR reporters in peptidyl–prolyl fragments. Obtaining structural information from selectively fluorine-labelled peptides, proteins, and non-peptidic structures requires the analysis of the physicochemical features of the F-carrying proline analogues. To better understand and ultimately predict the potential perturbations (e.g., in protein stability and dynamics) introduced by fluorine labels, we conducted a comprehensive survey of the physicochemical effects of CF3 substitutions at each ring position by comparing the behavior of CF3-substituted residues with the CH3-substituted analogues. The parameters analyzed include the acid–base properties of the main chain functional groups, carbonyl-group interaction around the residue, and the thermodynamics and kinetics of trans–cis isomerization. The results reveal significant factors to consider with the use of CF3-substituted prolines in NMR labeling and other applications. Furthermore, lipophilicity measurements demonstrate that CF3-substituted proline shows comparable hydrophobicity to valine, suggesting the potential application of these residues for enhancing interactions at nonpolar interfaces.


Introduction
Complex biological processes such as protein translation and folding are fundamentally influenced by the unique features of amino acid proline (Pro, Fig. 1). [1][2][3][4][5][6][7] The secondary amino group of Pro creates a tertiary amide-based backbone structure that is prone to cis-trans isomerization issues, which is sometimes responsible for the special role of Pro in protein folding. [4][5][6][7][8] Additionally, the cyclic nature of the Pro residue restricts the molecular conformation to certain envelope-type states of the pyrrolidine ring. [9][10][11][12] As the only coded amino acid with a restricted f torsion, Pro is typically positioned in specific structural contexts in biological systems relative to other amino acid residues.
Several unique biological phenomena associated with the presence of Pro in polypeptide structures have been investigated through the substitution with structural analogues of Pro (ProAs). ProAs were shown to impact translation yields 13,14 and velocities, 15 folding kinetics, [16][17][18] protein structural stability, [19][20][21][22][23][24][25] aggregation properties, 17,26 biological potency of peptides 27,28 and more. In each individual case, changes resulting from the substitution of a Pro with an analogue were attributed to structural differences or are rationalized using data obtained from experimental molecular models. [29][30][31][32] Parallel to these efforts, significant computational data have been generated for select ProAs in recent literature. [33][34][35][36][37][38] Curiously, the majority of the characterized substitution effects have involved very closely related ProAs and typically these substitutions occupied position 4 of the ring. A more general summary characterizing the effect of substitutions at each ring position is lacking. Therefore, we decided to summarize the physicochemical data of molecular models based on ProAs bearing methyl-(CH 3 -) and trifluoromethyl-group (CF 3 -) substitutions (Fig. 1). We started from a basic assumption that a methyl group would mimic the presence of an aliphatic substituent, whereas a trifluoromethyl group would exhibit an electron-withdrawing effect in addition to larger steric demands. The resulting physicochemical outcomes are presented in this work. While we sought to locate general trends related to the movement of substituents along the heterocyclic ring, our primary goal was to discern effects resulting from the presence of the CF 3 group due to its utility in 19 F labelling of polypeptides for NMR studies. The latter application is useful in peptide and protein labelling studies, 39,40 although other applications are also possible. This study is limited to a select set of regioisomeric ProAs collected from synthetic and commercial sources.

Available regioisomers
All isomeric methylprolines have been well described in the literature. In contrast, all trifluoromethylated prolines are relatively recent constructs, and these are all products of the XXI century chemistry. In 2002, the synthesis of the first trifluoromethylated proline, 4CF 3 Pro, was reported independently by two groups. [41][42][43] The synthesis of this regioisomer was later addressed by others. 44,45 In 2006, the first synthesis of 2CF 3 Pro was reported, 46 and additional syntheses of this amino acid and its analogues were established later. [47][48][49][50] The synthesis of 5CF 3 Pro was first reported in 2012, 51 and since then additional synthetic approaches to this compound and its diastereomers have been reported. 52,53 Synthesis of 3CF 3 Pro has not yet been provided in the literature; nonetheless, this regioisomer is available from commercial sources.
Thus, we located the set of regioisomeric trifluoromethylated prolines shown in Fig. 2, while the methylproline analogues were collected from commercial and synthetic sources (see ESI †). For example, a very convenient synthesis of 5CH 3 Pro from a glutamic acid derivative and Meldrum's acid was recently reported by Mohite and Bhat. 54 Amino group Physicochemical examinations were then performed on the identified set of ProAs and their derivatives. For example, for a-trifluoromethyl amino acids, it has been established that the ammonium group is severely deactivated in nucleophilic reactions due to the electron-withdrawing effect of the fluorinated moiety. As a result, special synthetic strategies towards the incorporation of these amino acids into peptides have been developed. [55][56][57][58][59] In order to quantify the amino group deactivation, we examined the acidity of the ammonium group in the set of selected amino acids (Table 1 and Fig. 3).
While the pK a of the ammonium group in methylproline remains nearly identical to that of Pro, the trifluoromethylgroup imposes a severe reduction of the value. These reductions are 4.6-4.9 pK a units at positions 2 and 5 and 2.2 units at positions 3 and 4. The magnitudes of the effects are similar to those previously described for trifluoromethylated pyrrolidines 60 and morpholines. 61 Notably, for the free amino acid, the C-terminal carboxyl-group remains ionized over the entire pH range of the ammonium protonation transition. In this case, the presence of the C-terminal charge stabilizes the ammonium group in the protonated state. 62 We mimicked the removal of the compensatory charge by esterification into a methyl ester. This reduces the pK a by Fig. 2 Regioisomers of trifluoromethyl-and methyl-proline selected for this study. Indices 't' and 'c' indicate the configuration of the substituent relative to the carboxyl group.

Carboxyl-group
We then examined the acidity of the carboxyl group in the N-acetyl derivatives, which is comparable to the C-terminal residue in a polypeptide. The N-acetylation has a dual effect by both removing the N-terminal charge and simultaneously generating two new conformational states of the formally single N-C(QO) bond: s-trans and s-cis amides, where prefix 's-' refers to the single bond (Scheme 1). This transition between the two conformations is slow on the NMR time scale, thereby allowing the determination of the pK a for each form separately. 64 We determined the acidity of the N-acetyl derivatives. The results are shown in Fig. 4 ( Table 1). The analyses revealed similar acidities among the 5CF 3 Pro, methyl-bearing derivatives, and Pro, while re-positioning of the CF 3 -group (from 4 to 3 to 2) gradually increased the acidity over the range of 1.6 pK a units.

Inter-carbonyl alignment
The difference between the pK a of the s-trans and s-cis amides is always positive (eqn (1)- (3)). We have previously demonstrated that this difference is indicative of the inter-carbonyl interaction, which plays an important role in the folding propensities of the amino acids. 64 The DpK a is between 0.67 and 0.70 for N-acetyl proline, and this increase indicates an attraction between the carbonyl groups. In ProAs, this effect can be related to the stabilization of the C 4 -exo side chain envelope conformation. 65 The inter-carbonyl alignment is an important contributor to the stability of polypeptide structures containing ProAs. This phenomenon has been attributed previously to the n-p* orbital interaction between the adjacent carbonyl groups. 66 Previous studies enable us to speculate that examined diastereomers of 3-, 4-and 5-methylprolines should exhibit elevated DpK a values due to the larger contribution of the side chain C 4 -exo conformation and more favourable angles between the interacting carbonyl groups. [67][68][69][70] Indeed, we observed  that the experimental DpK a values are higher for methylprolines compared to Pro (Fig. 5 and Table 1). Interestingly, for corresponding trifluoromethylprolines, the inter-carbonyl alignment is systematically lowered. Supporting this observation, we previously reported that the trans-amide stability in a model 4CF 3 Pro derivative was not enhanced despite the C 4 -exo conformation dominant in solution and crystal structure. 45 Thus, fluorination of the molecules slightly reduces the energy of the inter-carbonyl interaction, as can be clearly deduced from the presented data.
DpK a = pK a (trans) À pK a (cis) (1) In contrast to these tendencies, the 2-substituted Pro derivatives exhibited a remarkably higher interaction of carbonyl groups relative to the other cases. A previously reported crystal structure of a 2CH 3 Pro derivative illustrates that elevated alignment energy may result from the increased proximity of interacting groups due to steric effects of the a-substituent. 71 The larger steric effect of the CF 3 -group can explain the best alignment of the carbonyl groups featured by this amino acid. These results highlight potency of the a-substituents to stabilize the trans-amide bond according to stereoelectronic effect in addition to the steric one.

Amide rotameric preference
We then determined the trans/cis amide rotameric preference in N-acetyl derivatives bearing carboxylate, carboxylic acid and methyl ester groups at the C-terminus. Increasing the polarity of the carbonyl group leads to an increase in the inter-carbonyl alignment and increases the trans/cis amide ratio in the order of ester Z acid 4 salt (Scheme 1). The amide isomerism is impacted by steric factors due to the presence of bulky substituents around the amide bond for substitutions at the 2-and 5-positions in the ring. Fluorination evidently increases the original steric effect in both 2-and 5-substituted structures (Table 2). However, additional contribution of the inter-carbonyl alignment leads to additional stabilization of the trans-amide but not cis-amide structure. Thus, the substituent position confers an asymmetric effect on the trans/cis amide ratio (Fig. 6). Although, another reason for the asymmetric shape of the curves can be non-additivity of the steric sizes of the substituents.

Amide rotation kinetics
The rotational velocity of the amide bond around Pro is an important contributor to protein folding kinetics. Peptidyl-prolyl cis-trans isomerases are a class of enzymes that accelerate the amide rotation around prolyl residues, and the catalytic centres of these enzymes are attractive targets for the development of pharmaceutical peptide-based inhibitors. [72][73][74][75][76] ProAs often alter the amide rotational velocities in peptidyl-prolyl fragments, although the overall effect is complex. In order to clarify the effects of positional substitutions in the Pro ring, we summarized the potential contributing forces in Scheme 2. Effects A, B and C occur in the ground state. Effect A is invariant in the ground state and correlates with the basicity of the nitrogen atom, while effects B and C vary for the trans-and cis-amides and are reflected in amide thermodynamics. Since the rotation proceeds via the syn/exo transition state, 69,77 a substituent may sterically interfere with the oxygen atom, which shifts below the ring and opposite to the carboxyl group of Pro, thus creating effect D.
For example, an electron-withdrawing substituent usually reduces the barrier by destabilizing the ground state resonance (effect A, Scheme 2). However, the same substituent can have an opposite and compensatory effect on the transcis barrier by increasing the energy of the inter-carbonyl alignment. For example, this occurs in the case of 4-hydroxy 78 and 4-fluoroprolines. 79 Experimentally determined velocities of the cistrans and transcis amide rotations in methyl esters of N-acetyl amino acids (water) are shown in Fig. 7 (see also Table S1 in the ESI † for the data on C-terminal carboxylates). These values can be rationalized based on the considerations outlined in Scheme 2. For example, compared to Pro, the 2CH 3 Pro derivative exhibits a decreased cistrans barrier due to effect B, whereas the transcis barrier increases due to the additional contribution from effects C and D. In the derivative analogues of 2CF 3 Pro, both rotational barriers are decreased primarily due to the additional contribution of effect A, which is not present in methylprolines. Overall, the rotational barrier is the parameter that is most complex among all presented so far.

Lipophilicity
The attempts to classify Pro in the paradigm of hydrophobic/ hydrophilic dualism create an ambiguity in the literature, as evidence exists supporting both classifications of Pro in polypeptides. In natural proteins, Pro can be present in both hydrophobic interiors and exposed, highly solvated stretches. 80 For example, Pro can accommodate very lipophilic prosthetic groups such as retinal in a hydrophobic core of proteins. 81 Pro interactions with aromatic residues are also thought to be guided by the hydrophobic contacts between the aromatic ring and the ring structure of proline. 82 In many other instances, Pro occupies solvated and polar stretches primarily due to the extended nature of the secondary structures it forms. 80 Noteworthy, we recently demonstrated that the addition of Pro systematically decreases the lipophilicity in an oligoproline peptide series. 83 Both trifluoromethyl-and methyl-groups are expected to increase the lipophilicity of an organic molecule, 84,85 and the same effect should potentially be seen in the amino acid derivatives. A recent study of aliphatic fluorine-containing amino acids demonstrated that the main effect of fluorination on the overall polarity occurs due to the alteration of the backbone contacts with water, 86 although we presume that for ProAs, the backbone-solvation considerations are less relevant because the backbone-forming tertiary amide is less accessible to the solvent.
In order to characterize the outcome of CF 3 -/CH 3 -group substitutions, we examined experimental octan-1-ol/water partitioning (log P) values for the methyl esters of the N-acetyl amino acids (Fig. 8 and Table S2 in the ESI †). Our results demonstrate that the presence of a methyl group increased  the lipophilicity by approximately 0.4 log P units, whereas the trifluoromethyl moiety exhibits a larger effect of approximately 0.7 log P units. The lipophilic contribution of the CF 3 group is slightly higher in the 2CF 3 Pro derivative, presumably due to the partial intermolecular compensation of dipoles from the trifluoromethyl-and carboxymethyl-moieties. Overall, these data suggest that trifluoromethylprolines can be considered as hydrophobic amino acids, with the log P close to that of valine (Fig. 9). The hydrophobicity of amino acid analogues is an important consideration when designing peptide sequences, and some trifluoromethylated ProAs have recently been applied in attempts to improve the membrane permeability of therapeutic peptide constructs. 87 We have demonstrated that the local polarity differences introduced by the incorporation of 4-methyl-and 4-fluoroprolines impact structural stability at a level nearly equal to the preorganization from the inter-carbonyl alignment. 88 A notable fact is that 4-methylprolines are abundant in natural products, 89,90 and these are the simplest alkylproline derivatives among others occurring in nature. 91 Another common modification is hydroxylation in position 4; finally extensive experimental data have addressed 4-fluoroprolines. 19,20 For this reason, we compared the lipophilicity of the 4-substituted proline derivatives in Fig. 10. The data illustrate that the 4-hydroxyl and 4-methyl moieties have opposite effects, with the former enhancing the molecular hydrophilicity and the latter enhancing the molecular lipophilicity. Furthermore, 4-fluorination increases polarity, though less so than hydroxylation.

Discussion
A number of chemical routes have been developed for the convenient synthesis of amino acids bearing CF 3 -groups over the last two decades. Further application of these compounds in engineering and/or in peptides and proteins requires a comprehensive understanding of the physicochemical effects of trifluoromethylation. Interestingly, since the synthesis of the first CF 3 -bearing proline in 2002, [41][42][43] only one of these compounds has undergone physiochemical characterization through investigation of trans/cis amide properties and pK a (in 2015). 45 Scheme 2 Potential effects contributing to the amide rotation barrier in N-acetyl prolyl derivatives. EWG = electron-withdrawing group. Fig. 7 Amide rotation barriers determined for methyl esters of N-acetyl trifluoromethyl-(A) and methyl-(B) prolines: transcis (full purple circles) and cistrans (empty green circles). Measured in aqueous medium at 298 or 310 K (see Table 2 for details). In an attempt to provide the missing data we have systematically characterized all of the trifluoromethylproline regioisomers and compared obtained values with those of analogous methylproline counterparts (Fig. 2). The results of the N-terminal basicity and C-terminal acidity provide a numerical description for the electronwithdrawing effect of the CF 3 -group ( Fig. 3 and 4). Subsequently, we noticed that the inter-carbonyl alignment (represented by the DpK a and DpK a * values and trans/cis equilibrium constants) is reduced in the trifluoromethyl-derivatives compared to the methyl derivatives in the 3-, 4-, and 5-substituted amino acids ( Fig. 5 and 6). For the 2-position, however, substitutions the quaternization of the residue significantly enhances the original interaction between the carbonyl groups due to the steric reasons.
The kinetics of the amide bond rotation are reflected in the transcis and cistrans activation barriers (Fig. 7). The latter parameters are quite complex, as they are impacted by the amide thermodynamics as well as the N-terminal basicity. For instance, the curves in Fig. 7A (trifluoromethyl derivatives) can be considered as composite results of the barriers for the corresponding methylated amino acids (Fig. 7B) and the basicity trends (Fig. 3A). Transition state effects should be considered when the substituents interfere with the upstream carbonyl oxygen atom moving below the pyrrolidine ring. Another important observation is that trifluoromethyl substituents cause the amino acid to exhibit lipophilicity levels equivalent to those of valine, as demonstrated by the experimental examination of the octan-1-ol/water partitioning constants ( Fig. 8 and 9).
Our data will enable researchers to rationally and more accurately predict and plan experiments that involve Pro substitutions, in applications such as in the design of peptidyl-prolyl cis-trans isomerase inhibitors, 72-76 expansion of the genetic amino acid repertoire [13][14][15] as well as in protein engineering [21][22][23][24][25][26] and development of other peptide therapeutics. 87 However, the most outstanding application is the use of fluorine-labelled ProAs in 19 F NMR studies of polypeptides. A few works have suggested the use of fluoroprolines for this purpose (Fig. 11). 92,93 Recently vicinal difluoroprolines were proposed for use in 19 F NMR labelling, as these derivatives have a reduced conformational bias in the side chain and the amide bond. 94 To a great dissatisfaction, studies of the polarity changes introduced in ProAs are quite underrepresented in the literature. Even for fluoroprolines, which are most well studied, there is a very little awareness of the polarity changes introduced by these residues in the polypeptide structures. In this context, we recently demonstrated that local polarity changes introduced by ProAs may have quite a significant effect when placed on a periphery of a 9 kDa foldon trimeric propeller Fig. 9 The position of trifluoromethyl-(CF 3 Pros) and methyl-(CH 3 Pros) prolines on the lipophilicity scale based on log P octan-1-ol/water values previously reported for methyl esters of N-acetyl amino acids. 80 Fig. 10 Comparison of the experimental log P octan-1-ol/water values for 4-substituted prolines. Molecular conformation is sketched taking into account the preferred side-chain conformation. 29 structure. 88 As seen from further analysis of the log P values, 4-fluorination increases the polarity (Fig. 10), which may reduce the folding stability when the Pro residue is located at a buried, interior position. Geminal difluoroprolines 95 may potentially reduce the polarity of the proline residue following partial C-F dipole compensation 84,85 while maintaining conformational bias at minimum. 96 However, these would also have poor NMR properties due to a large coupling between the fluorine atoms. 94 Alternatively, Tressler and Zondlo adapted the use of perfluoro-tert-butytoxy 19 F probes 97,98 and proposed the use of O-perfluoro-tert-butyl-4-hydroxyprolines in NMR applications, 99 although these bulky lipophilic compounds also demonstrated a significant conformational bias in trans/cis amide equilibrium. Other proposed Pro substitutes are trifluoromethyl-4,5-methanoprolines, which were originally introduced as 19 F NMR labels for solid-state NMR studies in lipid membranes. 100 Later studies demonstrated that 4,5-methanolprolines also impose a notable conformational alteration depending on the stereochemistry of the cyclopropane unit. 101 A recent 19 F NMR study fully supported this conclusion. 102 However, their use in NMR labeling can justified given that the conformational bias is taken into account when interpreting the data.
A single trifluoromethyl group represents another alternative to the above-mentioned 19 F NMR labelling strategies. The utility of this labelling scheme follows in particular from the good relaxation profile of the axially rotating trifluoromethyl-group. 103 Our study together with our previous report 45 shows that the trans/cis ratios are minimally perturbed for the 3-and 4-position substitutions, as these positions are distant from the backbone. Thus, the hydrophobic residues of CF 3 -proline are good candidates for the design and engineering of non-polar interaction interfaces. These insights have enabled the recent application of 4CF 3 Pro for the labelling and detection of the first transmembrane polyproline helix by means of solid state 19 F NMR. 104 Also, some other ProAs, 4-trifluoromethyl-3,4dehydroproline 105 and difluoro-4,5-methanoprolines 106 can be considered promising fluorine-bearing proline substitutes. However, these compounds exhibit severely compromised stability, in particular, in basic media.
Finally, we believe that the physicochemical data reported in this study will further support the use of CF 3 -containing prolines for purposes ranging from customizable design of small molecules to the reprogramming of complex biological processes and structures.

Conflicts of interest
The authors declare the following competing financial interest. SP is affiliated to a commercial company, which sells some of the discussed amino acids. VK and NB declare no conflict of interests.