Unexpected Trends in the Hydrophobicity of Fluorinated Amino Acids Reflect Competing Changes in Polarity and Conformation

Citation R. Robalo, João; Vila Verde, Ana (2018): Unexpected Trends in the Hydrophobicity of Fluorinated Amino Acids Reflect Competing Changes in Polarity and Conformation. ChemRxiv. Preprint. Fluorination can dramatically improve the thermal and proteolytic stability of proteins and their enzymatic activity. Key to the impact of fluorination on protein properties is the hydrophobicity of fluorinated amino acids. We use molecular dynamics simulations, together with a new fixed-charge, atomistic force field, to quantify the changes in hydration free energy for amino acids with alkyl side chains and with 1 to 6 –CH to –CF side chain substitutions. Fluorination changes the hydration free energy by 1.5 to +2 kcal mol - 1, but the number of fluorines is a poor predictor of hydrophobicity. Changes in hydration free energy reflect two main contributions: i) fluorination alters side chain-water interactions; we identify a crossover point from hydrophilic to hydrophobic fluoromethyl groups which may be used to estimate the hydrophobicity of fluorinated alkyl side-chains; ii) fluorination alters the number of backbone-water hydrogen bonds via changes in the relative side chain-backbone conformation. Our results offer a road map to mechanistically understand how fluorination alters hydrophobicity of (bio)polymers. ⇤ Fluorination can dramatically improve the thermal and proteolytic stability of proteins and their enzymatic activity. Key to the impact of ﬂuorination on protein properties is the hydrophobicity of ﬂuorinated amino acids. We use molecular dynamics simulations, together with a new ﬁxed-charge, atomistic force ﬁeld, to quantify the changes in hydration free energy, D G Hyd , for amino acids with alkyl side chains and with 1 to 6 –CH ! –CF side chain substitutions. Fluorination changes D G Hyd by  1 . 5 to + 2 kcal mol  1 , but the number of ﬂuorines is a poor predictor of hydrophobicity. Changes in D G Hyd reﬂect two main contributions: i) ﬂuorination alters side chain-water interactions; we identify a crossover point from hydrophilic to hydrophobic ﬂuoromethyl groups which may be used to estimate the hydrophobicity of ﬂuorinated alkyl side-chains; ii) ﬂuorination alters the number of backbone-water hydrogen bonds via changes in the relative side chain-backbone conformation. Our results offer a road map to mechanistically understand how ﬂuorination alters hydrophobicity of (bio)polymers.


Introduction
The preferential interaction between apolar solutes in waterthe hydrophobic effect -is a key factor driving protein folding 1,2 , structural stability with respect to changes in temperature 3,4 (thermal stability) and interactions with other proteins and ligands 5,6 . The hydrophobic effect reflects the balance between solute-water interactions and direct, predominantly dispersive, solute-solute interactions 7,8 . Understanding how to use amino acid mutations to control the hydrophobic effect is critical to develop new protein-based drugs, biodevices and materials [9][10][11][12][13] . Simultaneously, minimizing changes in solute-solute packing upon mutations is desirable to ensure that protein structure -and thus function -is preserved 14,15 . This is, however, difficult with the limited pool of canonical hydrophobic amino acids because their side chains differ in structure and volume. Fluorinated versions of those amino acids, i.e. those where hydrogen atoms in side chain groups are substituted by fluorine (see Figure 1), can solve this problem while simultaneously enhancing other properties of interest 3,[16][17][18][19] . Even fluorinating only a few residues may enhance the hydrophobicity and passive diffusion of peptides through membranes 20 , the proteolytic resistance 21 and anti-microbial activity of proteins 22 , in addition to tuning their thermal stability 23 , making this synthetic approach of wide interest [24][25][26][27][28] .
Still, a caveat of using fluorination to control protein properties remains: do we understand the factors influencing the hydropho-Max Planck Institute for Colloids and Interfaces, Department of Theory & Bio-systems, Science Park, Potsdam 14424 Germany. Fax: +49 (0)331 5679602; Tel: +49 (0)331 5679608; E-mail: ana.vilaverde@mpikg.mpg.de † Electronic Supplementary Information (ESI) available: Description of force fields and simulation details, hydration free energies, electrostatic potential in the vicinity of amino acid side chains, linear models to reproduce the calculated hydration free energies, radial distribution functions, conformational changes in fluorinated side chains. AMBER-format force fields for mono-and di-fluorinated amino acids available upon request. bic effect involving fluorinated amino acids? The answer is, simply, that we do not. Here we focus on one factor contributing to the hydrophobic effect: solute-water interactions, for simplicity referred to as solute hydrophobicity. The hydrophobicity of solutes is usually characterized by their hydration free energies 8 . The hydrophobicity of fluorinated amino acids has been qualitatively estimated by considering the surface area of its side chain (the larger the surface area, the larger the hydrophobicity) 29,30 and its side chain polarity (the larger the polarity, the smaller the hydrophobicity) 23 , but detailed mechanistic understanding is still lacking 16,19 . Understanding the origins of fluorinationinduced changes in hydrophobicity depends critically on our ability to accurately quantify interactions between amino acids and their environment. We demonstrate that this quantification is now possible using molecular dynamics simulations and fixedcharge, all-atom models. The approach presented here is general, and may be used to investigate the hydrophobicity of any fluorinated (bio)polymer or small molecule.

Computational Methods
We used the TIP4P-Ew (ref. 31) water model, the AMBER14 (ff14sb; ref. 32) force field for the canonical amino acids and the GAFF force field for methane, ethane and propane. For the remaining amino acids and for the fluorinated small molecules, we used a force field developed by us (previous own work 33 and SI sections 1 and 2) based on AMBER14 (amino acids) or GAFF (small molecules). The main difference between our force field for fluorinated molecules and GAFF/AMBER14 lies in the Lennard-Jones (LJ) parameters of fluorine, and of hydrogen (H F ) bonded to a fluorinated carbon. The LJ parameters of fluorine were optimized to reproduce the hydration free energies of CF 4 and the molar volume of a 50% mix of CF 4 and CH 4 ; subsequently, those of H F were optimized to reproduce the hydration free energy and the molar volume of CHF 3 . Atomic partial charges were obtained following the GAFF (small molecules) Fig. 1 Molecular structures, commonly used names and abbreviations for the amino acids under study. Each amino acid residue is capped at the N-terminus with an acetate group (ACE, -COCH 3 ) and at the C-terminus with an N-methyl group (NME, -NHCH 3 ). Abbreviations for fluorinated amino acids follow a three-character nomenclature: initial character of parent (non-fluorinated) amino acid name (E, P, V, I, L); number of fluorine atoms (1,2,3,4,6); fluorination site (d carbon as D, g carbon as G or, in the case of chiral center formation following fluorination, R or S).
or AMBER (amino acids) protocols. We note that the requirement of compatibility with the protein force field implies that we retain the point charge representation. This representation has known shortcomings when modeling carbon-bound chlorine, bromine or iodine because these halogens exhibit s -holes (a positive area in the electrodensity distribution of the halogen atom, surrounded by a negative belt) which this charge model cannot represent 34 . Carbon-bound fluorine, however, retains a negative potential in its entire surface and the level of anisotropy in its electronic charge distribution is small [35][36][37] . For that reason, single point charge models have been used to model the interactions of fluorinated sites with water and with other organic molecules: e.g., Schyman & Jorgensen 34 , Ibrahim 38 and Ho 39 apply their models of s -holes to carbon-bound chlorine, bromine and iodine, but retain a point charge representation for fluorinated sites. Point charge models perform less well in the case of fluorinated aryl groups because the s -hole perturbation extends into the b -carbons; when modeling fluorinated aryl molecules, other promising models such as those based on permanent atomic multipole charges 37,40,41 should be considered. This extended perturbation occurs via the p electrons 37 , and consequently is not expected to be nearly as significant for fluorinated alkyl groups, which are the sole focus of the present work.
Free energy calculations were performed with Gromacs 5.0 [42][43][44][45][46][47][48] and molecular dynamics simulations to calculate other observables were performed with AMBER 14 49 . All systems were assembled using the built-in tools of the software package used to perform the simulations. A summary of the most relevant parameters used during the production runs is given in SI Table 2. Simulations used a time-step of 2 fs and constraints (LINCS 50 in Gromacs, SHAKE 51 in Amber) were applied to all bonds involving hydrogen atoms. Integration of the equations of motion was done using a leap-frog Langevin algorithm. Van der Waals interactions were shifted to zero between 1.0 and 1.2 nm, and long-range dispersion corrections were applied to both pressure and energy. Long-range electrostatics were treated with the PME scheme with a 1.2 nm cutoff, a grid spacing of 0.1 nm (AMBER) or 0.12 nm (Gromacs) and a 4th (AMBER) or 6th (Gromacs) order interpolation. Production runs were done in the NpT ensemble. The Monte Carlo 49,52 (AMBER) or Berendsen 53 (Gromacs) barostats were used with a relaxation time of 1 ps for an isotropic coupling of system pressure to 1 bar; temperature coupling was handled by the leap-frog Langevin integrator with a collision frequency of 1 ps 1 and a target temperature of 298 K.
Hydration free energies were calculated using Free Energy Perturbation (FEP) and Bennett Acceptance Ratio 54,55 (BAR), following the protocol we have previously adopted 33 and described in SI section 2.1. Briefly, in each case, simulations consisting of a single solute molecule in water were conducted, first decoupling Coulombic interactions and then LJ interactions. The coupling parameter l C for the Coulombic interactions was scaled linearly and assumed 21 equally-spaced values between 0 and 1; decoupling the LJ interactions was done over 59 states, with unevenlyspaced l LJ values between 0 and 1. At each state, we performed a steepest-descent minimization, BFGS minimization, 100 ps of NVT equilibration and 100 ps of NpT equilibration before collect-ing statistics for each state over 2 ns. Five independent production runs were performed for each amino acid.
Molecular dynamics simulations consisted of a steepest descent minimization, a conjugated gradient minimization, a 200 ps NVT heating from 0 K to 298 K, a 1 ns NpT equilibration and a 25 ns NpT production run; in the minimization, heating and equilibration steps the coordinates of the backbone atoms were restrained with a 20 kcal mol 1 potential. These trajectories were used to extract the data used as input for Equation 1, and were also used as input for the APBS 56 software to estimate the electrostatic hydration free energy of the amino acids using the linearized Poisson-Boltzmann equation, as described in more detail in SI section 3.1.

Results & Discussion
Change in DG Hyd with fluorination depends on the chirality and location of the fluorinated site and on amino acid identity. We calculated hydration free energies as a measure of the hydrophobicity of 16 fluorinated amino acids and their 5 nonfluorinated counterparts, totaling 21 aliphatic amino acids (see Figure 1). These free energies are shown in SI Table 4 and Figure 2; we reported some of these values in a prior publication 33 . The DDG Hyd values have an associated standard deviation of 0.1 to 0.3 kcal mol 1 (see SI Table 4), enabling the precise detection of differences between amino acids.
The amino acids with one or more -CH 3 ! -CF 3 substitutions (here termed fully fluorinated) are, as expected, always more hydrophobic than their non-fluorinated counterparts (positive DDG Hyd ) 3,19,57 , but the change in free energy is not constant per fluorinated group. Even more surprisingly, amino acids with -CH 3 ! -CH 2 F/-CHF 2 and -CH 2 -! -CF 2 -substitutions (here termed partially fluorinated) display a range of DDG Hyd values from 1.5 kcal mol 1 to +1 kcal mol 1 , regardless of the number of fluorine atoms. DDG Hyd depends strongly and non-intuitively on the chirality (R/S), location (g vs. d ) of the fluorinated sites, and on the identity of the amino acid.
Experimental hydration free energies for amino acids are not available, so we cannot directly assess the accuracy of our predictions. To test the accuracy of our force field, we applied it to all fluorinated variants of methane, ethane and propane for which experimental hydration free energies could be found 58 . These small molecules are the closest analogues to the side chains of the amino acids investigated here. The free energies of hydration for the fluorinated small molecules are shown in Figure 3 and SI Table 4. We show also the free energies of hydration of methane, ethane and propane, to illustrate the accuracy of the AMBER force field for the alkyl side chains of amino acids. The force field for fluorinated molecules reproduces the experimental hydration free energy very well in most cases. The largest deviation is seen for CH 3 F, whose DG Hyd is 0.8 kcal mol 1 too negative. The predicted hydration free energies for methane, ethane and propane, in contrast, are too positive by 0.5 to 0.8 kcal mol 1 . These results suggest that the predicted DDG Hyd for amino acids (Figure 2) may be systematically too negative by 0.5 to 0.8 kcal mol 1 . Differences between di-and trifluorinated amino acids should be well captured, but monofluorinated alkyl groups are likely excessively  hydrophilic in our model. Our main result -the large dependence of DDG Hyd,FEP on the identity of the amino acid and characteristics of the fluorinated site -is not affected by these force field shortcomings. Below we show that this dependence is in fact largely due to fluorination-induced conformational changes that alter the number of backbone-water hydrogen bonds. The AM-BER force field for proteins, upon which we build the force field for fluorinated amino acids, has been extensively improved for decades and is able to reproduce experimentally-measured structure and dynamics of folded small proteins [59][60][61][62][63] . Fluorinationinduced changes in conformation in our simulations result from steric hindrance and favorable electrostatic carbonyl-CF interactions, both of which can be captured, at least qualitatively, by all-atom, fixed charge force fields. The surprising variation in DDG Hyd is largely Coulombic in origin. Decomposing the hydration free energy into Lennard-Jones and Coulombic contributions can be naturally done in sim-ulations, and gives valuable insight into the origin of the observed trends. The Lennard-Jones contribution to DDG Hyd (Figure 2) is constant and positive per fluorinated group, positive but not constant for difluorinated amino acids and negative and constant for monofluorinated amino acids. Despite this variation in the LJ contribution to the free energy of hydration, the wide variation in DDG Hyd , particularly in the case of the partially fluorinated amino acids, is actually dominated by the Coulombic contribution to the free energy. This contribution varies seemingly unpredictably ( Figure 2), with each type of fluorination leading to either positive or negative DDG Coul Hyd : e.g., compare E2G with P2G, E1G with L1S, L1R and I1G, V3S with V3R. Previous reports on the dependence of lipophilicity on fluorine-induced polarity changes support the idea that the contribution of electrostatics to hydrophobicity is far from intuitive 64,65 . Neither contribution shows visible correlation with local hydration around the fluorinated site, as measured by the radial distribution function of water around the fluorinated sites (SI section 5.2.4 and Figure 8 A,B).
Developing an analytical solvation model to understand how fluorination affects DG Hyd . Can we understand the origin of these hydration free energies? To answer this question we model the fluorination-induced change in hydration free energy in terms of linear, multivariate models of the form DDG Hyd = Â i k i · DY i , where k i are fitting parameters and Y i are observables that should affect the hydration free energy and can be easily calculated in short molecular dynamics simulations. The LJ contribution to DDG Hyd is dominated by the energy (work) required to form a solute-sized cavity in water to accommodate the larger fluorine atoms. This contribution ( Table 1) is proportional to the change in solvent accessible surface area (DSASA) of the amino acid 66,67 , and can be easily quantified by measuring DSASA in molecular dynamics simulations, and multiplying it by the LJ component of DG Hyd per surface area unit of methane, which is essentially identical to that of CF 4 (SI Table 9).
The polar contribution is dominated by the interaction of a distribution of atom-centered point charges with water. This contribution can be estimated in multiple manners 33,64,65,68 . We first attempted to model the polar contribution as the sum of three terms, one proportional to a global quantity, the dipole moment µ, representing the molecular charge distribution, and the other two proportional to local quantities, the number of hydrogen bonds between water and amines (h NH ) or carbonyls (h CO ) in the backbone ⇤ . Fitting the DDG Hyd values calculated with FEP using a linear multivariate model (SI Equation 5) consisting of the sum of the contributions arising from changes in SASA, µ, h NH , and h CO due to fluorination was unsuccessful: the resulting fitting parameters had unphysical values, e.g., a positive energetic contribution of the dipole moment, and overly large errors (SI Table 6). We also attempted to model our data using an analogous version of this model, but where the area term reflects the difference in hydration free energy between CH 4 and CF 4 . This second model (SI Equation 6) has proven successful to understand hydrophobicity of tri-and hexa-fluorinated amino acids 33 , but it fails (SI Figure 4) when applied to the partially fluorinated amino acids. Given that the contributions of solute-water hydrogen bonds and the Lennard-Jones interactions to the hydration free energies are well-known 33,66,68 , we interpret these results as an inability of the molecular dipole moment to describe electrostatic solute-water interactions in a quantitative manner, at least for solutes as diverse as the current set of amino acids. We next attempted to characterize how fluorination alters the solute-water electrostatic interactions with another commonly used global descriptor: solving the linearized Poisson-Boltzmann (PB) equation, where the solvent is modeled as a continuum, as described in SI Section 3.1. Given the apparently simple problem we were trying to model, we were surprised to find that the electrostatic component of the hydration free energy calculated using PB hardly correlates with the reference FEP values (SI sence of correlation suggests that it is imperative to model the solvent as discrete water molecules. Thus, when aiming for a correct characterization of how aqueous solvation of biopolymers changes with mutations, not only the solute but also the solvent must be modeled without the use of global or mean-field descriptors. Backbone-water hydrogen bonds dominate the polar interactions between the backbone and water 68 ; the unresolved issue is how to describe polar interactions between the side chain and water. We find that these can be characterized by the electrostatic potential, F, at the position of the water oxygen atoms in the hydration shell of the side chains, as described in SI Section 4. The corresponding probability distributions of F for the fluorinated amino acids, shown in SI Figure 2 B-F, show large negative potential regions together with, in some cases, regions of more positive potentials than observed for the parent amino acid. The more positive potential, almost exclusive to mono-and difluorinated species, arises from a larger exposure of the positively charged carbon skeleton of the side chain, left partially unshielded by hydrogen in mono-and difluorinated groups (see SI Section 4 and SI Figure 3). The negative potential region, observed for all amino acids, can be attributed to the fluorine atoms. Water molecules near fluorine atoms often assume configurations that meet the geometric criteria for HOH···FC hydrogen bonds, as discussed in SI section 5.2, so we consider that these weak hydrogen bonds exist. This interpretation is consistent with ab initio calculations, and spectroscopic measurements [69][70][71][72] ; the strong correlation between 19 F NMR isotropic chemical shifts and the type of fluorineprotein interactions observed in the Protein Data Bank also suggest that hydrogen bonds to fluorinated alkyl groups exist, and are strongest for groups with low degrees of fluorination 73 , as we also observe (SI Table 7).
Our final model (Equation 1) reflects the above results. We capture the impact of the fluorination-induced differences on side chain-water interactions in two ways, via the average number of water-fluorine hydrogen bonds established by each amino acid (the h CH n F m terms, where n = 0, 1, 2 and m = 1, 2, 3, in Equation 1) and via the fluorination-induced change in the number of water molecules experiencing the positive potential region of the side chain (the DF + term in Equation 1). Fitting Equation 1 to the DDG Hyd data shown in Figure 2 yields the parameters in Table 1; see SI section 5.2 for details of the fit.
(1) Figure 4 shows the correlation between the FEP-calculated hydration free energies and the ones calculated using Equation 1. The model describes the changes in DG Hyd following fluorination for most cases, with an average deviation between model and FEP results of only 0.26 kcal mol 1 . Poorer agreement occurs for E1G, for which it yields a value of DDG Hyd which deviates 0.5 kcal mol 1 from the value given by the FEP simulations and actually has the wrong sign, and for E2G, L4D and L3S, for which deviations are 0.7, 0.5 and 0.5 kcal mol 1 each. Our prior work suggests that the source of these large deviations might be Fitting Parameter

Value
Error P-value Table 1 Values of the fitting parameters from Equation 1, and associated standard errors and P-values. NA: not applicable; ⇤ calculated via error propagation. entropic 33 , and we speculate it might be related to changes in solute conformational entropy, which are not included in Equation 1. Clarifying this point is outside of the scope of the present work.
Decomposing the contributions to DDG Hyd . The performance of the multivariate model is sufficiently good to enable insight into the mechanisms of fluorination-induced changes in solvation, by decomposing the individual contributions to DDG Hyd as seen in Figure 5. Surface area. Within derivatives of the same amino acid, the surface area increases DDG Hyd proportionally to the number of fluorine atoms (SI Figure 5). The magnitude of the increase per fluorine atom depends on the amino acid identity, which implies that the hydration shells around each side chain are disturbed to a different degree, when accommodating the hydrogen ! fluorine substitution -compare, e.g. E2G with P2G. Backbone-water hydrogen bonds. The contribution of hydrogen bonds between water and carbonyl groups is always positive because fluorination reduces the number of these hydrogen bonds by steric blockage. The magnitude of this contribution, for fluorinated variants of a given amino acid, is again proportional to the number of fluorine atoms in the side chain. In contrast, the contribution of amine-water hydrogen bonds varies between positive and negative because fluorination may increase or decrease the number of these hydrogen bonds. Steric blockage occurs because of fluorine's large size and, for some amino acids, because fluorination changes the preferential conformation of the side chain as discussed in SI section 6. These results are consistent with previous reports indicating that CF and carbonyl groups interact favorably 74,75 , and that changes in the conformational preference of fluorinated alkyl groups affect a molecule's lipophilicity, membrane permeability and inhibitory activity 25,76,77 .
Side-chain polarity. The most interesting contributions arise from the polarity of the fluorinated side chain. The large, positive electrostatic potential affecting hydration waters adds an average +0.4 kcal mol 1 to the DDG Hyd of all partially fluorinated amino acids, and a near-zero, positive, contribution to the trifluorinated amino acids; the largest deviations come from I1G and I3D, for which this contribution is +0.8 kcal mol 1 . As indicated above, this contribution stems from the partial shielding of the positively charged carbons occurring in partially fluorinated groups. Regarding the water-fluorine hydrogen bonds, they are much weaker than those with the carbonyl or amine groups, and decrease in stability in the order -CH 2 F > -CHF 2 /-CF 2 -> -CF 3 , as indicated by the relative magnitudes of the relevant parameters in Table 1. These trends are expected: our simulations show that the distance between the water oxygen and the fluorine is smaller in groups with fewer fluorines (SI Table 7) indicating an increase in hydrogen bond strength, and other experiments and ab initio calculations have shown the same trends 73,78 . Despite the weakness of the hydrogen bonds between water and the diand tri-fluorinated groups, they nevertheless play an important role: e.g., they are present ⇡15% of the time per fluorine in CF 3 groups (SI Table 8); for comparison, the number of water-methyl configurations per side chain CH group meeting the hydrogen bond criteria in the non-fluorinated amino acids is almost negligible (⇡3%). The magnitude of the hydrogen bond contributions to DDG Hyd per fluorine atom also follows the order -CH 2 F > -CHF 2 /-CF 2 -> -CF 3 , with the average values being 1.79, 0.60 and 0.13 kcal mol 1 per fluorine, respectively † ; the weak water-fluorine hydrogen bonds to -CF 3 contribute on average a non-negligible -0.39 kcal mol 1 -CF 1 3 . Contributions of the side chain to DDG Hyd are largely constant per CF x substituent. The values of the free energy contribution per water-fluorine hydrogen bond yielded by the multivariate model anticorrelate surprisingly well with the LJ contribution of the area per fluorine atom (SI Figure 6), suggesting that the energy cost associated with (overall repulsive) waterfluoromethyl LJ interactions is partially offset by the energy gain from the formation of water-fluorine hydrogen bonds. This point is illustrated in Figure 6, where the ratio of the area and waterfluorine hydrogen bond contributions is plotted against the number of fluorine atoms. It is clear that the contribution of each Contribution of the changes in surface area (DA), hydrogen bonds between water and carbonyls (Dh CO ) or amines (Dh NH ), water molecules exposed to a large positive electrostatic potential (DF + ) and hydrogen bonds between water and fluorine in mono-(h CH 2 F ), di-(h CF 2 ) or trifluorinated (h CF 3 ) amino acids to the total change in hydration free energy (DDG Hyd ), between fluorinated and non-fluorinated amino acids, calculated using Equation 1. For one hydrogen ! fluorine substitution, the hydrogen bonding over-compensates the cavity-formation cost. As the number of substitutions increases, the penalty for cavity formation per fluorine is reduced, showing that the perturbation due to the insertion of the first fluorine atom is large but further insertions perturb the water network to a smaller extent. Simultaneously, the energy gain per fluorine from water-fluorine hydrogen bonds is decreased for multiple fluorine insertions, likely due to the less negative partial charges on fluorine and the fewer H F atoms, which interact favorably with water (see SI section 2.3). Interestingly, there is a crossover point, associated with the number of fluorine substitutions, after which the energy cost surpasses the energy gain; in other words, there is a transition between a locally hydrophilic moiety to a locally hydrophobic moiety. Inspecting Figure 6 we find that the number of fluorine atoms required to perform the transition is 2.8. This crossover point has associated uncertainty arising from the systematic deviations in the hydration free energies observed for small alkanes with the GAFF/Amber force field (see Figure 3 and discussion above). Even considering this uncertainty, it appears that trifluoromethyl groups, by themselves, impart only a small increase in amino acid hydrophobicity, because the positive Lennard-Jones component of the hydration free energy is partially offset by the weak but still favorable waterfluorine hydrogen bonds.
RDFs of water around the fluorinated site do not give insight into local contributions to the DG Hyd . Given that the impact of fluorination on local solvation is fairly constant for groups with the same number of fluorine atoms (Figure 6), we investigated whether these local changes in solvation correlate well with the radial distribution function (RDF) of water around methyl or fluoromethyl groups. Specifically, we calculated the excess free energy, DDG Shell,PMF , necessary to populate the first hydration layer of methyl or fluoromethyl groups from the potential of mean force associated with each radial distribution function. Our results (SI section 5.2.4, Figure 8 C) show that such a correlation is at best very weak. We could also not find correlations between DDG Shell,PMF and the Coulomb and Lennard-Jones components of the hydration free energy (SI Figure 8A,B). Radial distribution functions describe solvation along a single reaction coordinate, which may be insufficient to give quantitative or semi-quantitative insight into local hydrophobicity. In contrast, proximal radial distribution functions -i.e., those calculated perpendicular to the solute atoms -appear near universal for proteins 79 and have proven useful to estimate hydration free energies of small molecules and peptides 80,81 . Exploring their usefulness for fluorinated molecules is outside the scope of the present work.

Concluding Remarks
We present an all-atom, fixed-charge force field for amino acids with fluorinated alkyl side chains that is compatible with the AM-BER force field for proteins. With it we investigate how mono-, di-and trifluorination alters amino acid solvation. Our predictions indicate that side chain fluorination alters the hydration free energy of amino acids in surprising ways: DDG Hyd strongly depends on the chirality and location of the fluorinated site and on the identity of the amino acid. Using a simple, analytical solvation model (Equation 1), we trace back these dependencies to the multiple mechanisms by which fluorination alters solvation of amino acids: there is a cost of introducing larger fluorine atoms, gains and costs associated with the higher polarity of fluorinated alkyl groups, and gains or costs from altering the number of backbone-water hydrogen bonds as a result of changed conformational preferences. For small molecules, it is often possible to predict the sign and even estimate the magnitude of the change in hydrophobicity upon fluorination. In contrast, for complex molecules 'the devil is in the details': the contribution of each mechanism to the overall hydrophobicity depends on conformational preferences and interactions between different parts of the molecule, making rules-of-thumb insufficient. For example, monofluorination does not always make amino acids more hydrophilic; similar increases in the solvent-exposed surface area of different molecules do not imply that the molecules will experience similar increases in hydrophobicity. Solvent accessible surface area descriptors of hydration free energies remain useful for complex molecules, but only when other contributions are properly accounted for.
The solvation model given by Equation 1 and applied here to amino acids can also be used to interpret molecular dynamics results of other small molecules containing the same functional groups, and extended for other functional groups. The model is also directly relevant for proteins: together with short molecular dynamics simulations of proteins in the folded and unfolded ensembles, it can be used to gain insight into how fluorination-induced changes in protein-water interactions contribute to changes in the free energy of folding. Good sampling of the folded protein ensemble can easily be achieved in many cases with molecular dynamics simulations; to sample the un-folded protein ensemble, one can take advantage of a number of algorithms [82][83][84][85][86] . Future work by our group will attempt to extend the solvation model to include mechanisms by which fluorination alters intra-protein non-bonded interactions. This extension is necessary to obtain a complete picture of the mechanisms by which fluorination alters the thermal stability of proteins.
Molecular dynamics studies with custom-tailored force fields and phenomenological models based on discrete rather than mean-field descriptors are key to gain mechanistic insight on solvation, as exemplified in this work. The force field and model we present lay the foundation to interpret how fluorination alters the hydrophobicity of other (bio)polymers. Further improving these force fields and our understanding of solvation will require the experimental measurement of free energies of hydration for amino acids or molecules of similar complexity. If the vapor pressure of the pure compound in liquid form is known together with its aqueous solubility, the free energy of hydration can be immediately calculated assuming ideality 58 . We deliberately restricted our study to amino acids for which synthesis protocols exist 19 , and we hope that a direct comparison between experiment and simulation will be possible in the future.    Table 1 LJ parameters e and s for the carbon, fluorine and hydrogen atoms in methane/methyl/methylene and fluorinated methane/methyl/methylene groups used in this work. All final results with partially fluorinated alkyl groups use the H F parameters optimized in this work.

1-20 | 1
Non-bonded Van der Waals interactions between any two particles, i and j, over an inter-particle distance r i, j are calculated via the LJ potential as with e i j = p e ii e j j and s i j = s ii + s j j 2 following the Lorentz-Berthelot combination rules [5][6][7][8][9][10][11] .
Atomic partial charges for the fluorinated methane derivatives were obtained with the Merz-Singh-Kollman Restrained Electrostatic Potential (RESP) methodology on a single conformation for each molecule, following the standard GAFF protocol. The charges were calculated at the RHF/6-31G* level of theory from a two-step geometry optimization (MP2/6-31G* optimization followed by RHF/6-31G* optimization). Quantum mechanical calculations were performed with the Gaussian 03 software 12 , and RESP fitting of the partial charges was performed with the Antechamber package 13 .
The atomic partial charges for all non-canonical amino acids are obtained via a multi-configuration RESP fitting procedure, described in greater detail in ref. 3, performed over 200 conformations (100 a-helical and 100 b -strand) per amino acid.

SIMULATION DETAILS
The results discussed in the main text are obtained from Free Energy Perturbation (FEP) simulations -to calculate the hydration free energies of the amino acids -and molecular dynamics (MD) simulations -all other observables, e.g. solvent accessible surface area, dipole moment, numbers of hydrogen bonds -of single copies of the amino acids in water. In addition, FEP simulations of a single CHF 3 in water were used to calculate hydration free energies in the parameterization of fluorocarbon-bound hydrogen (H F ). Also used in the parameterization were simulations of liquid CHF 3 , to calculate its molar volume, and additionally simulations of gas phase CHF 3 , to calculate its vaporization enthalpy.
Obtaining the partial charges for the partially fluorinated amino acids required initial simulations of single copies of the amino acids (with non-optimized charges) in water, to obtain multiple configurations that were then used in the multi-configuration RESP fit, as described in ref. 3. The simulation boxes used for each type of simulation are illustrated in SI Figure 1; periodic boundary conditions in all directions were used in all cases. All systems were assembled using the built-in tools of the software package used to perform the simulations. A summary of the most relevant parameters used during the production runs is given in SI Table 2. Simulations used a time-step of 2 fs and constraints (LINCS 15 in Gromacs, SHAKE 16 in Amber) were applied to all bonds involving hydrogen atoms. Integration of the equations of motion was done using a leap-frog Langevin algorithm. Van der Waals interactions were shifted to zero between 1.0 and 1.2 nm, and long-range dispersion corrections were applied to both pressure and energy.   Table 2 Simulation details for the production runs of the systems: C n H m (aq) = one C n H m molecule in water; CHF 3 (l) = pure CHF 3 ; AA (aq) = one capped amino acid in water. Obs = calculated observable; Box = box type (corresponding panel in Figure 1); # mol = number of molecules; p = pressure; T = temperature; t = production run length; # rep = number of independent runs (with different velocity assignments in the NVT equilibration step); Package = software package used in the simulations. Simulations for the second iteration of the RESP fit (extracting multiple configurations of the amino acids) consisted of a steepest descent minimization, a conjugated gradient minimization, a 200 ps NVT heating from 0 K to 298 K, and a 1 ns NpT equilibration; in the minimization, heating and equilibration steps the coordinates of the backbone atoms were restrained with a 20 kcal mol 1 potential. The production step was run in the NpT ensemble, keeping the same restraints on the backbone atoms.
Long-range electrostatics were treated with the PME scheme with a 1.2 nm cutoff, a grid spacing of 0.1 nm and a 4th order interpolation. The Monte Carlo barostat 14,19 was used with a relaxation time of 1 ps for an isotropic coupling of system pressure to 1 bar; temperature coupling was handled by the leap-frog Langevin integrator with a collision frequency of 1 ps 1 . Simulations for the calculation of the input parameters in Equation 1 in the main text followed the same protocol except that no 1-20 | 3 restraints were imposed on the backbone atoms; these trajectories were used as input for the APBS 20 software to estimate the electrostatic hydration free energy of the amino acids using the linearized Poisson-Boltzmann equation, as described in more detail below.

FREE ENERGY PERTURBATION
The potential form of the scaled intermolecular non-bonded interactions follows SI Equation 3, with the soft-core parameter a LJ set to 0.5 for any l 6 = {0, 1} and zero otherwise.
Hydration free energies were calculated using Free Energy Perturbation (FEP) and Bennett Acceptance Ratio 21,22 (BAR), following the protocol we have previously adopted 3

ENTHALPY OF VAPORIZATION AND MOLAR VOLUME
Following the protocol of Gough and co-workers 23 , the enthalpy of vaporization per mole, DH, of pure CHF 3 was calculated as assuming that the gas phase systems behave ideally. In this expression, DE = E g E l , p is the pressure, V is the volume and R is the ideal gas constant. E l is the potential energy per mole calculated from a simulation of a cubic box of pure CHF 3 in the liquid phase, and E g is the equivalent quantity calculated using a simulation box containing only one molecule of CHF 3 (gas phase) and otherwise the same simulation parameters. For each system, the molar volume was calculated by dividing the average volume of the liquid phase simulation box by the total number of molecules, then multiplying by Avogadro's constant.   Table 3 Calculated hydration free energy (DG Hyd ), enthalpy of vaporization (DH Vap ) and molar volume (V Mol ) for CHF 3 and hydration free energy for CH 2 F 2 and CH 3 F, using our optimized parameters; target values are shown within parenthesis. For the hydration free energy, solubilities are converted to the Ostwald coefficient 24 , from which the standard free energy of solvation can be calculated 25 ; for the vaporization enthalpy and molar volume of CHF 3 , experimental values are taken from Gough and colleagues 23 . Results are shown as mean ± standard deviation of five independent simulations.

PARAMETERIZING THE H
It would also be of interest to have an estimate of the optimal parameters for hydrogen in monoand di-fluorinated carbons, which would be more relevant considering the fluorination patterns in the amino acids we are studying. Unfortunately, relevant experimental data on these species is sparse: for CHF 3 , used in our parameterization, there are experimentally obtained values of its hydration free energy, molar volume and enthalpy of vaporization; for CH 3 F there is only the hydration free energy, ⇤ Original value in the GAFF 4 force field for carbon-bound hydrogen.
† Original value in the GAFF 4 force field for carbon-bound hydrogen.

1-20 | 5
for CH 2 F 2 there is no data. Hydration free energies, enthalpies of vaporization and molar volumes, calculated with our parameters for CHF 3 , CH 2 F 2 and CH 3 F, are given in SI Table 3. Our parameters follow the expected trend of increasingly positive hydration free energy with increasing degree of fluorination, though the hydration free energy for CH 3 F is off by 0.7 kcal mol 1 . Because of the lack of experimental data, we cannot comment on the quality of our parameters for CH 2 F 2 .
It is worthwhile noting that the polar hydrogen atom we have parameterized is a main contributor to the more hydrophilic character of partially fluorinated methane relative to either CF 4 or CH 4 , as becomes obvious by inspecting SI Tables 3 and 4. This effect of the polar hydrogen partially explains why the proportionality between surface area and hydration free energy no longer stands for partially fluorinated methane derivatives: each of the methane derivatives in SI Table 3 has a higher surface area than CH 4 while having a much lower hydration free energy (the hydration free energy of methane is 2.549 kcal mol 1 ; see SI Table 4). In addition, the non-uniform charge distribution in the partially fluorinated molecules also increases the hydrophilicity of these molecules relative to CH 4 or CF 4 .

HYDRATION FREE ENERGIES OF FLUORINATED AND NON-FLUORINATED AMINO ACIDS
The hydration free energy and differences in hydration free energy between fluorinated and nonfluorinated amino acids are presented in SI Table 4; also presented are the Coulombic and LJ com-  Table 5 holds the results of the PB calculations. SI Figure 2 A makes the comparison between PB and FEP hydration free energies. This figure clearly demonstrates that the PB approach does not capture the relative differences in the electrostatic hydration free energy  Table 4 Free energy of hydration (DG Hyd ), Coulombic (DG Hyd,Coul ) and Lennard-Jones (DG Hyd,LJ ) components of the free energy of hydration for small molecules and amino acids; differences in the free energy of hydration (DDG Hyd ), Coulombic (DDG Hyd,Coul ) and Lennard-Jones components of the free energy of hydration (DDG Hyd,LJ ) between fluorinated and nonfluorinated molecules. All quantities were calculated using free energy perturbation and are shown as mean ± standard deviation, of five independent simulations, in kcal mol 1 . * retrieved from ref. 3. Related to Figure 2 in the main text.
between the amino acids, and does not give insight into these systems.

CALCULATING THE ELECTROSTATIC POTENTIAL IN THE VICINITY OF THE SIDE CHAINS
The electrostatic potential was calculated on a grid (0.1 Å of spacing) around the side chain of each amino acid, with conformations extracted from a 25 ns simulation of each amino acid in water, using only the partial charges of the side chain atoms. Then, the electrostatic potential was interpolated to the coordinates of oxygen atoms belonging to water molecules within a radius of 5.5 Å of the side chain carbon atoms; this radius corresponds to the first minimum of the radial distribution function of water oxygen atoms near methane's carbon atom, that is, it corresponds to the farther limit of the first hydration shell of methane. The probability density distributions of the electrostatic potential, acid. SI Figure 3 shows the electrostatic potential F around the side chain of ETG, E1G, E2G and E3G.
Compared to ETG, the negative potential on the fluorinated derivatives arises from the fluorine atoms, the positive potential from the exposed carbon skeleton, especially from the fluorinated carbon (the     Molecular structure (upper row) and electrostatic potential (lower row) for a single conformation of the side chain of ETG, E1G, E2G and E3G. The surfaces where the electrostatic potential is mapped were calculated using a probe of radius 5.5 Å. The color code for the electrostatic potential map is the same for all molecules, scaling from red (negative) to blue (positive).

MODELING THE HYDRATION FREE ENERGY OF FLUORINATED AMINO ACIDS
1-20 | 9 5.1 Two linear, multivariate models that fail to explain how fluorination alters hydration free energies of amino acids We attempted to fit the DDG Hyd for all amino acids using SI Equation 5: In this equation, the coefficient k 1 of the DA term is the LJ component of the hydration free energy per surface area unit of methane and the coefficients k 2 4 for the water-carbonyl hydrogen bond (Dh CO ), the water-amine hydrogen bond (Dh NH ) and the dipole moment (Dµ) terms are freely optimized in the fitting procedure. SI Table 6 holds the values of the fitting parameters, errors and P-values resulting from this attempt. As described in the main text, this model proved unsuccessful: the contributions of the water-backbone hydrogen bonds are unphysical, with hydrogen bonds with amines being much stronger than with carbonyls, and increases in dipole moment increasing amino acid hydrophobicity. In a second attempt to understand our data, we turn to a model (SI Equation 6) we have previously successfully fitted to the changes in hydration free energy associated with tri-and hexa-fluorinated amino acids only 3 . This model has similar terms to those in SI Equation 5; the main difference is that the coefficient of DA is estimated from the difference in areas and in hydration free energies between CF 4 and CH 4 .
SI Figure 4 illustrates the correlation between the model predictions and the hydration free energies calculated using FEP for all the amino acids under consideration here. The model largely fails for the mono-and difluorinated amino acids.  simulations show that the donor-acceptor distance for these hydrogen bonds (SI  Table 7 Average distance between the water oxygen atom and the fluorine atom when a water-fluorine hydrogen bond is formed in a monofluoromethyl group (-CH 2 F), a difluoromethyl/difluoromethylene group (-CHF 2 /-CF 2 -) or a trifluoromethyl group (-CF 3 ). Related to Equation 1 in the main text.

Details of the fitting procedure
The quantities used as input for fitting Equation 1 in the main text can be found in SI Table 8. The parameters k 1 to k 7 were fit in stages. The value of k 1 is simply the Lennard-Jones component of the hydration free energy of methane per unit area, for the reasons indicated in the main text (also, see SI Table 9). The values of k 2 and k 3 , corresponding to the contributions of the hydration free energy per water-carbonyl and water-amine hydrogen bonds, respectively, are from ref. 3. The remaining parameters were optimized by fitting Equation 1 in the main text to the hydration free energies of all the amino acids using the input data in SI Table 8.

Calculating P-values
The model captures the difference in hydration free energies between the partially fluorinated amino acids and their non-fluorinated counterparts less well than for the fully fluorinated ones ( Figure 4 in the main text). The cause for this discrepancy could possibly be related to an inadequacy of a linear model to represent either or both the water-fluorine hydrogen bond and the electrostatic potential terms in Equation 1 in the main text. To test the validity of using Equation 1 (see main text) to calculate the hydration free energy differences between fluorinated and non-fluorinated amino acids, we   Figure 4 of the main text, which is the central message of this work.

The interplay between cavity-formation costs and gains from water-fluorine hydrogen bonds
The correlation between the energy required to form an amino acid-sized cavity in water and the energy gain from the formation of water-fluorine hydrogen bonds is found in SI Figure 6. It is clear that both the penalty associated with the surface area and the gain from hydrogen bond formation decrease with the increasing number of fluorine atoms per fluoromethyl group.

Radial distribution functions and hydration free energy
It would be useful to interpret the hydration free energy changes we have reported in terms of 1-20 | 14 Fig. 7 A-H) radial distribution functions (g(r)) of water oxygen atoms relative to the carbon atom in the indicated fluoromethyl group, or its equivalent methyl group in the non-fluorinated amino acid, for A) the ethylglycine series, B,C) the valine series, D) the propylglycine series, E,F) the isoleucine series and G,H) the leucine series; the color code follows the total number of fluorine atoms in the side chain: black=0, red=1, yellow=2, green=3, cyan=4, blue=6. The term k B T ln[g(r)] corresponds to the PMF, with k B being the Boltzmann constant, T the system temperature and g(r) the radial distribution function. The term g(r)V (r) r Bulk is the particle density of water in a volume V(r) corresponding to the difference in volume of two spheres, one with radius r and the other with radius r 0.02 (the spacing between values of r at which the g(r) is calculated); r Bulk is the particle density of bulk TIP4P-Ew water. The summation in Equation 7 runs over values of r between r 0 (the C-O distance at which g(r) > 0) and r Shell (the radius of the hydration shell of either methane, 5.531Å, or tetrafluoromethane, 5.690Å). Defining r shell as the value for which each g(r) reached its first minimum did not alter the trends described below.
We searched for correlations between the information contained in the RDFs, in the form of

Changes in side chain conformation induced by fluorination
To assess whether fluorination changes the occurrence of backbone-water hydrogen bonds by steric blockage, we calculated the distances between the carbon atom in a methyl or a fluoromethyl group (for example, C g in the ethylglycine series, C d in the leucine series) and the center of mass of either the carbonyl or amine group bound to the alpha carbon. The probability distributions for ETG (SI Figure 9) show that the side chain adopts a preferred conformation relative to the amine group where the methyl-amine distance is approximately 3 Å, while simultaneously adopting two equally probable conformations relative to the carbonyl group, corresponding to distances close to 3 and 4 Å. Fluorinating this methyl group progressively increases the probability of visiting a conformation that is both farther from the amine group and closer to the carbonyl, in agreement with the decrease in the number of carbonyl-water hydrogen bonds and the increase in amine-water hydrogen bonds observed for this series of amino acids (SI Table 8).

Fig. 9
Probability density (P(d)) associated with the distances (d) between methyl or fluoromethyl groups in the side chain of ETG, E1G, E2G and E3G and the A) carbonyl group or B) amine group bound to the alpha carbon in the backbone.