Akash Deep
Biswas
^{ab},
Vincenzo
Barone
^{bc},
Andrea
Amadei
*^{d} and
Isabella
Daidone
*^{a}
^{a}Department of Physical and Chemical Sciences, University of L’Aquila, via Vetoio (Coppito 1), 67010 L’Aquila, Italy. E-mail: isabella.daidone@univaq.it
^{b}Scuola Normale Superiore di Pisa, Piazza dei Cavalieri 7, I-56126 Pisa, Italy
^{c}National Institute for Nuclear Physics (INFN) Pisa Section, Largo Bruno Pontecorvo 3, 56127 Pisa, Italy
^{d}Department of Chemical Science and Technology, University of Roma Tor Vergata, via della Ricerca Scientifica, 00133 Roma, Italy. E-mail: andrea.amadei@uniroma2.it
First published on 23rd March 2020
Here we present a computational approach based on molecular dynamics (MD) simulation to study the dependence of the protein hydration-shell density on the size of the protein molecule. The hydration-shell density of eighteen different proteins, differing in size, shape and function (eight of them are antifreeze proteins), is calculated. The results obtained show that an increase in the hydration-shell density, relative to that of the bulk, is observed (in the range of 4–14%) for all studied proteins and that this increment strongly correlates with the protein size. In particular, a decrease in the density increment is observed for decreasing protein size. A simple model is proposed in which the basic idea is to approximate the protein molecule as an effective ellipsoid and to partition the relevant parameters, i.e. the solvent-accessible volume and the corresponding solvent density, into two regions: inside and outside the effective protein ellipsoid. It is found that, within the model developed here, almost all of the hydration-density increase is located inside the protein ellipsoid, basically corresponding to pockets within, or at the surface of the protein molecule. The observed decrease in the density increment is caused by the protein size only and no difference is found between antifreeze and non-antifreeze proteins.
Although it is now rather well accepted that the density of the protein hydration shell is higher than the bulk one,^{11,12,15,16,18,20} there is not a consensus about the magnitude of this increment. Depending on the protein, on the experimental technique, or on the computational approach, used and on the distance from the protein surface at which the density increment is evaluated, different estimates have been given ranging from 1% to 50%.^{11,12,16,20}
Here, we focus on the dependence of the protein hydration shell density on the protein size, and partly function by analyzing the hydration properties of eighteen different proteins, out of which eight are antifreeze proteins (AFPs), by means of MD simulation and a recently developed analysis to investigate hydration shell densities.^{15,18} Previous work have focused on the length-scale dependence of the hydration density of hydrophobic solutes^{36–39} but, to the best of our knowledge, this is the first systematic analysis of the correlation between hydration shell densities and the size and shape of globular proteins, which are systems with amphipathic surfaces.
Then, we make the following assumptions:
• (i) we assume that the protein excluded volume inside the effective protein ellipsoid can be expressed as a fixed fraction of the protein ellipsoid volume and that this fraction is constant for the different proteins:
(1) |
• (ii) similarly, we assume the ratio between the excluded volume due to protein atoms outside the protein ellipsoid and the excluded volume due to protein atoms inside the protein ellipsoid to be a constant independent of the protein type:
(2) |
• (iii) we assume that the inner shell-density (i.e. ρ_{in}) is the same for each protein (this assumption will be confirmed by the MD simulation results, see Results section). For the outer shell-density (ρ_{out}) two cases are considered, one in which it is considered the same for each protein, the other in which it is affected by the shell size which depends, in turn, on the protein size. Hence, ρ_{out} is given either by a constant independent of the protein considered
(3) |
or by a simple function of the ratio α = V_{sh}/V_{ell,eff}
(4) |
From the above assumptions we may obtain the solvent-accessible volume inside the protein ellipsoid, V^{in}_{acc}, and the solvent-accessible volume outside the protein ellipsoid, V^{out}_{acc}, as follows
V^{in}_{acc} = V_{ell,eff} − V^{in}_{ex} ≈ V_{ell,eff}(1 − χ) | (5) |
V^{out}_{acc} = V_{sh} − V_{ell,eff} − V^{out}_{ex} ≈ V_{ell,eff}[α − (1 + γχ)] | (6) |
The total accessible volume within the solvation shell, V_{acc}, can be then expressed as follows
V_{acc} = V^{in}_{acc} + V^{out}_{acc} ≈ V_{ell,eff}[α − χ(1 + γ)] | (7) |
Similarly, the density within the solvation shell can be expressed as
(8) |
Finally, we consider the relative increment of ρ_{sh} with respect to the bulk water-density ρ_{b}, η = ρ_{sh}/ρ_{b} − 1
(9) |
From eqn (1) and (2) we can express the total excluded volume via the parameter m
V_{ex} = V^{in}_{ex} + V^{out}_{ex} ≈ χV_{ell,eff} + γχV_{ell,eff} = (1 + γ)χV_{ell,eff} = mV_{ell,eff} | (10) |
Similarly, by expressing the mean number of solvent molecules inside the effective protein ellipsoid n_{in}via the inner shell density
n_{in} = (1 − χ)ρ_{in}V_{ell,eff} | (11) |
In order to obtain a reliable estimate of ρ_{in} we compared for each protein the values of n_{in} provided by the protein–solvent simulation with that as obtained removing from a pure solvent simulation (at the same temperature and pressure) the excluded volume of the protein (see Methods section). Given that the accessible volume inside the effective protein ellipsoid is the same for the protein–solvent and pure solvent simulations, the ratio of the mean numbers of solvent molecules is identical to the ratio of the two inner densities, with the inner density of the pure solvent simulation (by definition) identical to the bulk one. It will be shown in the Results section that ρ_{in}/ρ_{b} over the protein sample used is approximately constant for all proteins, thus confirming that ρ_{in} can be considered invariant for all the proteins and providing its estimate by simply multiplying such ratio (i.e. the mean ratio over the protein sample) by the bulk density.
Therefore, once obtained m,ρ_{in} and β, we can estimate 1 − χ = β/ρ_{in} and hence by using either eqn (3) or (4) into eqn (9) we have either
(12) |
(13) |
By using eqn (12) (the lower-level model) or eqn (13) (the higher-level model) to fit the values of η versus α as provided by the protein sample, either r_{0} or a_{1} and a_{2} parameters can be evaluated to obtain a fully analytical expression of η(α) at the two levels of approximation.
We end this section with a comment on the applicability of our model to larger proteins, i.e. in the range of α values tending to 1. The model has a formal lower limit, α_{min}, because V^{out}_{acc}/V_{ell,eff} needs to be ≥0, which implies, according to eqn (6), that V^{out}_{acc}/V_{ell,eff} = α − m − (1 − χ) ≥ 0, and thus α ≥ m + (1 − χ) = α_{min}. However, it is likely that in the proximity of α_{min} (i.e. close to a vanishing relative outer accessible volume) our model is not fully reliable (see Results section for an estimate of α_{min}).
In addition, a 10 ns-long NVT simulation of a pure SPC box at 300 K at the reference pressure of 560 bar is performed in order to compare the properties of the protein hydration shells with the properties of fictitious hydration shells filled with bulk water (see below in the Methods section).
Concerning the choice of the water model, we showed in a previous paper^{18} for four of the proteins used here that the density results were not dependent on the water model chosen. To do this we performed MD simulations, and subsequent analyses, with both SPC^{42} and TIP4P/2005 models^{48} at two different temperatures, i.e. room temperature and a temperature just above the melting temperature for each of the two water models. The results with SPC were fully consistent with the ones obtained with the TIP4P/2005 water model.
In order to characterize the hydration-shell density of a protein we approximate the protein molecule as an ellipsoid defined, at each MD time frame, by the eigenvectors and eigenvalues of the 3 × 3 geometrical covariance matrix of the x, y, z atomic coordinates, as described in the recent papers.^{15,18} Briefly, for each time frame of a trajectory the three eigenvectors define the protein ellipsoid axes, with the corresponding lengths provided by the eigenvalues: considering a Gaussian atomic positional distribution along each eigenvector, a semi-axis was used, with i = 1, 2, 3 and λ_{i} the eigenvalue of the ith eigenvector. A set of ellipsoidal layers around the protein defined by the consecutive ellipsoids with semi-axes a^{(n)}_{i} = a_{i} + nδ, with fixed increment δ = 0.03 nm, was then considered. By calculating, and averaging, the instantaneous SPC density within each layer (disregarding the possible presence of protein atoms and/or counter-ions) we obtained the solvent-density profile around the protein, within layers of increasing distance from the protein ellipsoid surface (i.e. the layer-solvent-density profile). Given that such solvent density profile does not account for the effect of excluded volume of the non solvent atoms, the density values within layers including a significant number of protein atoms are lower than the actual density. Typically, single protein atoms can be still present in layers at 0.7–0.8 nm from the protein ellipsoid surface while for layers beyond 1 nm basically no protein atoms are detected. Therefore, the differences among the protein SPC density profiles within the first hydration layers (up to ≈0.4–0.5 nm) largely reflect differences in the spatial arrangement and compactness of protein atoms.
In order to remove the effect on the density profile of the spatial arrangement of the protein atoms that protrude from the protein ellipsoid surface and to compare the solvent density inside each hydration layer with the density of the bulk solvent, we use a previously employed strategy^{18} to construct a fictitious hydration shell filled with bulk water. To this aim, the protein coordinates obtained from the protein + solvent MD simulations are inserted into a pure SPC box extracted from a MD simulation of water alone. The water molecules of the pure SPC box overlapping with the protein coordinates are removed, providing a protein + solvent box in which the hydration shell is filled with the bulk solvent. Water molecules of the pure SPC box are removed where the distance between any existing atom and any atom of the inserted protein is less than the sum of the van der Waals radii of both atoms. For each protein, this is done for 10000 different protein coordinates, each one inserted in a different bulk solvent box. The solvent density around the protein as a function of the distance from the protein ellipsoid surface is then calculated for these fictitious protein + solvent configurations and we call it ρ_{b,fict}. Finally, the ratio ρ/ρ_{b,fict}, i.e. the density variations with respect to the bulk density, is reported as a function of the distance from the protein ellipsoid surface (see Results section).
We define an effective protein-ellipsoid volume, V_{ell,eff}, to be used in studying the protein-size dependence of η. This is achieved by adding to each protein ellipsoid semi-axis the solvent radius (we used 0.14 nm as the solvent radius) plus the effective mean thickness due to the excluded volumes of the protein atoms over the ellipsoid surface (0.06 nm) which was obtained via a numerical fit to protein partial molecular volumes, as described in a previous work.^{15} Briefly, the effective mean thickness of the ellipsoid surface was obtained by tuning its value in order to best reproduce the experimental partial molecular volumes over a sample of 14 globular proteins. A more accurate method^{50} that could be used to define an effective protein ellipsoid has been proposed based on the calculation of the neutron scattering profile from MD simulation which is used to back-calculate the gyration-tensor of the protein and its closest hydration layers. Nevertheless, such an approach is more time-consuming than the method used here and hence less effective for the analysis of a large number of proteins.
By using the mean hydration shell volume, V_{sh}, corresponding to the mean volume of the ellipsoid defined by the boundary layer of the hydration shell (as obtained by adding 1 nm, i.e. the distance from the protein ellipsoid surface at which the density approaches the plateau value of bulk density, to each semi-axis of the protein ellipsoid), the mean number of SPC molecules within such an ellipsoid, n_{sh}, and the mean protein excluded volume, V_{ex}, we can obtain the mean solvent density within the accessible volume of the protein hydration shell, ρ_{sh}, via:
(14) |
We can compare ρ_{sh} to the solvent bulk-density, ρ_{b}, by calculating the relative increase η = ρ_{sh}/ρ_{b} − 1. As is shown in Tab. 1 the simulations of the studied proteins provided, by means of eqn (14), a hydration shell mean density in the range of 4–14% higher than the bulk one. This range of η is in agreement with experimental data^{20} and previous calculations.^{11,12,15,18}
By analyzing the data reported in Table 1, in which also the mean effective-protein ellipsoid, V_{ell,eff}, the mean ellipsoidal hydration-shell volume, V_{sh}, are reported, one can notice a positive correlation between η and the size of the protein, as provided by the analysis of the correlation between η and the protein volumes. This behavior is opposite to what was observed for hydrophobic solutes in the same length-scale range,^{39} for which a decrease in hydration density was observed as a function of the protein size.
Protein | Residue N. | V _{ell,eff} | η | |
---|---|---|---|---|
Note: the error on V_{ell,eff} and V_{sh} is ≈0.3%, the one on η is ≈1%. Errors on the different quantities were evaluated through the standard error of their mean calculated over 3 (independent) subtrajectories. | ||||
Trp-cage | 20 | 4.46 | 5.95 | 0.050 |
Yeast-AFP | 25 | 6.87 | 4.97 | 0.052 |
AFPI | 37 | 8.06 | 4.86 | 0.046 |
Heliomicin | 44 | 9.47 | 4.38 | 0.065 |
GB1 | 56 | 11.81 | 4.00 | 0.079 |
AFPIII | 66 | 12.27 | 3.91 | 0.071 |
BPTI | 58 | 12.41 | 3.94 | 0.068 |
TmAFP | 82 | 12.46 | 4.00 | 0.083 |
Ubiquitin | 75 | 14.40 | 3.70 | 0.090 |
CfAFP | 121 | 19.65 | 3.37 | 0.088 |
Barnase | 108 | 20.55 | 3.30 | 0.094 |
Lysozyme | 129 | 22.64 | 3.19 | 0.099 |
RiAFP | 143 | 23.26 | 3.26 | 0.089 |
AFPII | 129 | 25.70 | 3.06 | 0.106 |
Myoglobin | 154 | 28.63 | 2.99 | 0.111 |
LeIBP | 241 | 38.73 | 2.73 | 0.123 |
HCAII | 261 | 46.61 | 2.56 | 0.133 |
COVID-19 | 306 | 54.93 | 2.56 | 0.125 |
On the contrary, no clear dependence of η was found on the protein eccentricity, as provided by the analysis of the correlation between η and the eccentricity of the ellipsoids approximating the protein molecules (see Fig. 3A), and on the hydrophobicity of the protein surface, as provided by the analysis of the correlation between η and the hydrophobic fraction at the protein surface (see Fig. 3B). In the ESI,† Fig. S2 and S3, a global view of the chemical character of the surfaces of all proteins is given in terms of hydrophilic/hydrophobic atoms.
In order to interpret the dependence of the hydration shell-density on the protein size through the model proposed in the Theory section, we firstly need to obtain m, ρ_{in}/ρ_{b} and 1 − χ, the latter evaluated as 1 − χ = β/ρ_{in}.
To get m we plotted V_{ex} as a function of the effective protein-ellipsoid volume, V_{ell,eff}, for all the studied proteins, see Fig. 4A, and from a linear fit of the data we obtain m (m = 0.95). From the data and the quality of the fit it is evident that the linear relation we assumed between V_{ex} and V_{ell,eff} is a good approximation ensuring that one of our basic assumptions is reasonable.
To get ρ_{in}/ρ_{b} we used the procedure described in the Methods section, i.e. ρ_{in}/ρ_{b} is calculated from the ratio of the mean number of solvent molecules inside the protein effective ellipsoid and the corresponding number calculated on the fictitious protein + solvent configurations. The ρ_{in}/ρ_{b} values, calculated over the protein sample and reported in Fig. 5A as a function of V_{sh}/V_{ell,eff}, show a rather flat distribution with a mean value of 1.56, thus confirming that ρ_{in} is constant for all the proteins as was assumed in the model. A similar analysis of ρ_{out}/ρ_{b}, evaluated as the ratio of the mean number of solvent molecules outside the protein effective ellipsoid, but inside the hydration shell, and the corresponding number calculated on the fictitious protein + solvent configurations was performed (see Fig. 5B). The comparison between ρ_{in}/ρ_{b} and ρ_{out}/ρ_{b} shows that almost all of the hydration shell density increase is concentrated inside the protein effective ellipsoid, being the former ≈1.56 and the latter in the range of 1.00 and 1.08. Moreover, ρ_{out}/ρ_{b} shows a slight dependence on the protein size.
Fig. 5 (A) ρ_{in}/ρ_{b} as a function of α = V_{sh}/V_{ell,eff} for all the studied proteins. A linear regression of the ρ_{in}/ρ_{b} data yields a slope coefficient of −0.031 which is within one standard error of the coefficient (0.032), showing that ρ_{in}/ρ_{b} can be considered basically constant over the protein sample. (B) ρ_{out}/ρ_{b} as a function of α = V_{sh}/V_{ell,eff} for all the studied proteins. The black solid line is the analytical function ρ_{out}/ρ_{b} = (a_{0} + x)/(x + a_{1}) with a_{0} = −0.63 and a_{1} = −0.75 as obtained from the fit of η with eqn (12) (see Results section). The analytical expression provides a good reproduction of the ρ_{out}/ρ_{b} values obtained from the MD simulations. |
Similarly as for m, by expressing the mean number of solvent molecules inside the effective protein ellipsoid n_{in}via the inner shell density n_{in} = (1 − χ)ρ_{in}V_{ell,eff}, we can obtain β = (1 − χ)ρ_{in} by a linear regression of the n_{in}versus V_{ell,eff} plot (see Fig. 4B) as provided by shell-density calculations (we get β = 11.61). From β and ρ_{in}, we can evaluate 1 − χ = β/ρ_{in} = 0.22.
Once obtained m, ρ_{in}/ρ_{b} and (1 − χ) we can use eqn (12) and (13) to fit the values of η as a function of α = V_{sh}/V_{ell,eff} calculated for the different proteins (see Fig. 6). From the quality of the fits it emerges that both levels of approximation of the model employed are reasonably accurate, with the higher-level model performing slightly better consistently with the slight size-dependence observed from the explicit calculation of ρ_{out}/ρ_{b} (see Fig. 5). A further validation of our model comes from the fact that if the parameters a_{0} and a_{1}, as obtained from the fitting of η with eqn (13), are used into the analytical function for ρ_{out}/ρ_{b} (ρ_{out}/ρ_{b} = (a_{0} + x)/(x + a_{1})) a good reproduction of the ρ_{out}/ρ_{b} values provided by the MD simulations is obtained (see Fig. 5B).
Fig. 6 (A) η as a function of α = V_{sh}/V_{ell,eff} calculated for the different proteins. The data are fitted with eqn (12) (dashed line) providing r_{0} = ρ_{out}/ρ_{b} = 1.04 and eqn (12) (full line) providing a_{0} = −0.63 and a_{1} = −0.75; the correlation coefficient is 0.97 for both models. The eight AFPs are highlighted in red. |
The relevant parameters (i.e. 1 − χ, ρ_{in}/ρ_{b} and ρ_{out}/ρ_{b} which are independent of the level of approximation) can be utilized to understand how the solvent molecules are partitioned inside the hydration shell possibly explaining the variation of the solvent-shell density according to the protein size. According to our model ≈22% of the effective protein-ellipsoid volume is accessible to solvent (i.e. 1 − χ = 0.22) and basically all the solvent density increase is concentrated in this region. In fact, the solvent density in the inner accessible volume is approximately 56% higher than bulk (i.e. ρ_{in}/ρ_{b} ≈ 1.56), while it is basically equal to the bulk density in the outer region (i.e. ρ_{out}/ρ_{b} = 1.04 in the case of the lower level of approximation). Thus, almost all of the solvent density increase is confined within the protein and within grooves and pockets at the protein surface, as shown in Fig. 7 in which a representative configuration showing the water molecules within the protein ellipsoid is reported.
The protein-independent density increment of ≈56% found here is in line with previous findings in which the hydration density increase was calculated by means of MD simulation at a distance of 0.2 nm from the protein surface for lysozyme^{11} (≈50% increase was found) and from the analysis of the crystal waters in contact with the protein surface for 22 crystal structures^{51} (an average ≈20% increase was found).
Our protein data-set contains proteins with a specific function, namely antifreeze proteins. AFPs are able to inhibit ice growth by binding to ice nuclei and, although their ice-binding mechanism is still unclear, yet the hydration layer is thought to play a fundamental role. We have previously shown that the hydration density of two of these AFPs, CfAFP and AFPIII, are very similar to non-AFPs.^{18} Here, we extend the study to a total of eight AFPs spanning a rather large length-scale range, and compare the hydration density with non-AFPs. As shown in Fig. 6, in which the AFPs proteins are highlighted in red, the behavior of the AFPs does not differ from that of non-AFPs. Indeed, if we separate the data into two sets, one comprising non-AFPs and the other AFPs proteins, and fit our model to the two sets independently we basically obtain the same results. Thus, our results do not show any peculiar behavior of AFPs in terms of the protein-size dependence of the hydration-shell density with respect to non-AFPs.
As explained in the theory section, our model has a formal lower limit α_{min} = m + (1 − χ) = 1.17. However, it is likely that in the proximity of α_{min} (i.e. close to a null relative outer accessible volume) the model is not fully reliable. We might consider a value of α around 1.5 as a lower limit, which corresponds to a maximum number of residues of approximately 500–600, hence limiting the applicability of the model to small-to-medium-sized globular proteins.
The protein-independent increment of the density of the water molecules confined within the protein effective ellipsoid of ≈56% obtained from our model is consistent with previous explicit calculations showing that the density of water molecules in close proximity to the protein surface, and in particular in concave regions, is 20–50% higher than bulk.
Finally, we show that the antifreeze proteins studied here, which cover a broad range of sizes and types (hyperactive from insects, CfAFP, TmAFP and RiAFP, moderately active from fish, AFPI, AFPII and AFPIII, and moderately active from yeast, yeast-AFP and LeIBP), do not behave differently from non-antifreeze proteins in terms of the size-dependence of their hydration density, in line with our previous work in which we showed that the shell thickness and density of two AFPs did not feature any relevant difference with respect to two non-AFPs.^{18}
Footnote |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/c9cp06214a |
This journal is © the Owner Societies 2020 |