Antonino
Polimeno
* and
Mirco
Zerbetto
Università degli Studi di Padova – Dipartimento di Scienze Chimiche, Padova, Italy. E-mail: antonino.polimeno@unipd.it
First published on 6th November 2018
We show that under proper assumptions it is possible to estimate with good precision the principal values of the rotational diffusion tensor of proteins from the analysis of short (up to 2–3 ns) molecular dynamics trajectories. We apply this analysis to a few model cases: three polyalanine peptides (2, 5, and 10 aminoacids), the fragment B3 of protein G (GB3), the bovine pancreatic trypsin inhibitor (BPTI), the hen egg-white lysozyme (LYS), the B1 domain of plexin (PB1), and thrombin. The protocol is based on the analysis of the global angular momentum autocorrelation functions, complementing the standard approach based on rotational autocorrelation functions, which requires much longer trajectories. A comparison with values predicted by hydrodynamic modeling and available experimental data is presented.
Computationally, the determination of rotation dissipative properties can be based essentially on two approaches, which complement each other and interpret experimental evidences. On one side, several hydrodynamics approaches have been implemented,^{7–10} which are capable of good predictive performances at the cost of few free parameters (usually an effective radius of spheres representing atoms or groups of atoms). On the other side, a well known method is based on the extraction of diffusion/friction properties from molecular dynamics (MD) trajectories. Usually, the standard methodology employed to this purpose is based on the analysis of rotation autocorrelation functions.^{11,12} Both methods are based on the assumptions that the rotational motion is diffusive, i.e. that the molecule can be described as a Brownian diffusive rotator:
(1) |
(2) |
(3) |
In the case of an axial D tensor with principal values in the MF D_{⊥}, D_{⊥}, D_{‖}, eqn (1) and (2) allow to write
G^{j}_{k}(t) = e^{−[D⊥j(j+1)+(D‖−D⊥)k2]t} | (4) |
(5) |
One well known problem with this direct approach is that in small or medium macromolecule, G^{j}_{k}(t) has relaxation times of the order of 10 ns, associated with diffusion rates of the order of 10^{6}–10^{7} Hz. Therefore one has to calculate quite long molecular dynamics trajectories, of the order of, at least, 100 ns. At present, these long trajectories are affordable, but still the whole process is unnecessarily time consuming. A secondary, but nevertheless not ignorable problem is related to the model of explicit solvent (usually water) used in the simulation. It is known^{12} that the predicted diffusion tensors strongly depend on the model employed for the water molecules and that the accordance with experimental data scales approximately as the ratio between the MD and real bulk viscosity of water. The worst agreement is found with the more widely employed model of water, i.e. the TIP3P model, which gives, at 298 K, a bulk viscosity about 2.7 times higher than the real viscosity^{14–16} and the calculated diffusion rates are substantially in the same ratio with experimental measurements. Of course improvements in the force fields employed can significantly abate this issue, which is known to become more relevant with increasing MD trajectory length,^{11} leading to a wrong sampling of phase space of long-time processes (as diffusive regime properties) and consequently to bad statistics. The purpose of this work is to introduce a slightly alternative approach, which allows to significantly reduce the MD trajectories duration, in order to maximize computational efficiency and accuracy. We propose to calculate the friction (ξ) rotational tensor (see below) instead of the diffusion rotational tensor of a molecule, from observables relaxing with very fast time scales. Components of the conjugate angular momentum are obvious candidates, since they should relax in the sub-picosecond time scale in small/medium size proteins. In order to test the method we analyze eight molecules of increasing size, in particular: 2-alanine (2ALA, 23 atoms), 5-alanine (5ALA, 53 atoms), 10-alanine (10ALA, 103 atoms), the bovine pancreatic trypsin inhibitor (BPTI, 458 atoms), hen egg-white lysozyme (LYS, 1001 atoms), the fragment B3 of protein G (GB3, 437 atoms), the B1 domain of plexin (PB1, 945 atoms), and thrombin (THR, 5031 atoms). The paper is organized as follows: in Section 2 we review the methodology, its rationale, details of the MD simulations and we discuss the results obtained. In Section 3 we summarize our conclusions.
(6) |
r = L × (I^{−1}L × ∇_{L}) | (7) |
In the following we assume (iii) that precession effects are negligible, which is usually an acceptable approximation in a normal solvent at room temperature (moderate to high friction) and (iv) that tensors I, ξ and hence D are constant diagonal and collinear in the same frame, i.e. the MF. Let us now consider correlation functions of the angular momentum
(8) |
G_{α}(t) = e^{−ξαt/Iα} = e^{−t/τα} | (9) |
2ALA | 5ALA | 10 ALA | GB3 | BPTI | PB1 | LYS | THR | |
---|---|---|---|---|---|---|---|---|
Protein data bank file | 1P7E | 6PTI | 2JPH | 2LZT | 1PPB | |||
Total charge | 0 | 0 | 0 | +2 | +6 | +2 | +8 | +2 |
Number of water molecules | 898 | 885 | 834 | 3921 | 2642 | 10418 | 6845 | 21116 |
Cubic periodic box dimension/Å | 30 | 30 | 30 | 50 | 44 | 69 | 60 | 90 |
Equilibration period/ns | 5 | 5 | 5 | 6 | 6 | 6 | 6 | 10 |
Short MD production period/ns | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 |
Long MD production period/ns | — | — | — | 200 | 200 | 110 | — | 100 |
Results for all of the proteins have been compared with the diffusion tensor based on the decay of rotational correlation functions obtained from long MD trajectories. All simulations, except for LYS, have been obtained with the same setup and software package as described above. The only difference was the sampling frequency, which was 250 MD steps for the small proteins, and 1000 MD steps for thrombin. Literature data have been employed for LYS.
The post-run protocol for the analysis of the MD simulations is the following. First, the snapshots of the dry trajectories are unwrapped (to remove jumps due to the periodic boundary conditions), centered on the center of mass, and aligned along the frame that diagonalizes the tensor of inertia. All atoms, included H atoms, are considered in these operations. At this point, in each of the snapshots, the coordinates of the atoms are referred in the instantaneous inertial molecular frame MF. We stress the importance of using such a frame, since it is the frame where it is assumed that ξ is diagonal. This, in turn, means that only in this frame, in the limit of a rigid rotator, the autocorrelation functions of the components of the angular momentum are expected to be mono exponential decays.
Next, the components of the momenta of the atoms are computed in MF from the numerical forward derivative of the Cartesian coordinates. Finally the three components of the total angular momentum are calculated
(10) |
In principle, the angular momentum could be calculated from the velocities; however the protocol to express the velocities in MF is more complex than the route described above (e.g., the angular velocity is required). From our tests, the larger efforts needed to use the velocities are not increasing the quality of the results despite the problems that one could expect in performing numerical derivatives. Clearly, it is better to test the stability of the results with the lag time (which controls the quality of the derivatives), but this can be easily done because only one very short trajectory is required for the computation, and the analysis is straightforward.
The autocorrelation functions for the three components of the angular momentum showed to be more complex than mono exponential decays. As an example, Fig. 2 shows G_{X}(t), and its logarithmic plot, for the plexin B1 protein (in the ESI,† other autocorrelation functions are shown in order to provide a wider picture). They are actually better described by multi-exponential functions. This behavior has been observed in the past also for other observables^{21} and can be interpreted in terms of coupling of the angular momentum to internal coordinates of the system and/or solvent, that relax at a close timescale. An augmented Fokker–Planck description of a flexible top would be beyond the scope of this work, which is to identify a fast and effective protocol to determine the principal values of D, for an approximately rigid system. Therefore, we adopt the simple strategy of (i) fitting the calculated correlation functions as bi-exponential decays
G_{α}(t) = a_{α}^{2}e^{−t/τα} + (1 − a_{α}^{2})e^{−t/α} | (11) |
Results are reported in Table 2. The smaller relaxation time is associated to the relaxation of angular momentum, i.e. τ_{α}. In order to calculate friction and diffusion coefficients we consider inertia coefficients averaged over the trajectories. As the table shows, in some cases the autocorrelation functions can be fitted at maximum with one single exponential. This is particular was found for the alanine peptides for which the autocorrelation functions show a damped oscillatory decay due to the closeness in the time scales between the angular momentum and internal motions of the molecule. The correlation functions for the thrombin protein could also be fitted with only one exponential decay. Our interpretation is that in the case of such a large protein, the other relaxation processes are occurring in large time scales, with much less important coupling with the angular momentum. These different behaviours, which are all deviations from the rigid rotator model, reflect different types of couplings of the global angular momentum with other motions, internal to the molecule and/or of the solvent. The origin and rationalization of these effects is not the purpose of this paper. What is interesting is that despite the presence of these effects, the friction/diffusion tensors can be calculated easily and with satisfactory agreement with other methods for the calculation of the diffusion tensor and also with experimental data, as discussed in what follows.
α | a _{ α } ^{2} | τ _{ α }/fs | _{ α }/ps | I _{ α }/10^{−41} kg m^{2} | ξ _{ α }/10^{−28} kg m^{2} s^{−1} | D/10^{7} Hz |
---|---|---|---|---|---|---|
2ALA | ||||||
X | 1 | 10.71 | — | 0.0016 (0.0002) | 0.0052 | 794 |
Y | 1 | 12.24 | — | 0.0014 (0.0001) | 0.0131 | 315 |
Z | 1 | 14.66 | — | 0.00055 (0.00009) | 0.0096 | 428 |
5ALA | ||||||
X | 1 | 14.04 | — | 0.012 (0.002) | 0.084 | 48.9 |
Y | 1 | 13.83 | — | 0.010 (0.003) | 0.074 | 55.7 |
Z | 1 | 10.37 | — | 0.0028 (0.0008) | 0.027 | 152 |
10ALA | ||||||
X | 1 | 27.83 | — | 0.030 (0.001) | 0.11 | 38.1 |
Y | 1 | 22.38 | — | 0.030 (0.002) | 0.13 | 31.2 |
Z | 1 | 16.94 | — | 0.0069 (0.0002) | 0.041 | 102 |
GB3 | ||||||
X | 0.94 | 152.4 | 1.34 | 0.69 (0.03) | 0.456 | 9.1 |
Y | 0.91 | 155.1 | 1.09 | 0.83 (0.03) | 0.535 | 7.7 |
Z | 0.97 | 177.5 | 4.95 | 1.06 (0.05) | 0.599 | 6.9 |
BPTI | ||||||
X | 0.73 | 138.2 | 0.450 | 0.60 (0.03) | 0.432 | 9.6 |
Y | 0.62 | 106.9 | 0.307 | 1.11 (0.05) | 1.03 | 4.0 |
Z | 0.34 | 82.3 | 0.274 | 1.30 (0.04) | 1.58 | 2.6 |
PB1 | ||||||
X | 0.82 | 193.0 | 1.34 | 2.68 (0.07) | 1.39 | 3.0 |
Y | 0.86 | 187.5 | 1.48 | 3.33 (0.07) | 1.78 | 2.3 |
Z | 0.84 | 200.7 | 1.55 | 3.95 (0.09) | 1.97 | 2.1 |
LYS | ||||||
X | 0.94 | 212.3 | 2.30 | 2.77 (0.09) | 1.30 | 3.2 |
Y | 0.67 | 151.1 | 0.673 | 3.75 (0.10) | 2.48 | 1.7 |
Z | 0.81 | 191.8 | 1.03 | 4.18 (0.14) | 2.18 | 1.9 |
THR | ||||||
X | 1 | 181.0 | — | 16.8 (0.2) | 9.29 | 0.46 |
Y | 1 | 167 | — | 15.1 (0.2) | 9.06 | 0.47 |
Z | 1 | 161 | — | 13.1 (0.1) | 8.15 | 0.53 |
Results obtained from the analysis of the angular momentum correlation functions are here compared with available experimental data, calculations based on rotational autocorrelation functions and a hydrodynamics-based estimate of the rotational diffusion tensor. The latter has been done with the software package DiTe (diffusion tensor):^{8} in this case calculations are carried out by setting the same temperature of the MD simulations, as reported in the Simulation details subsection; the corresponding viscosity of water is 8.94 mPa s (at 298.15 K), 8.51 mPa s (at 300 K), and 6.92 mPa s (at 310 K). Stick hydrodynamic boundary conditions are imposed. The standard hydrodynamic radius of 2.0 Å has been assigned to all the non-hydrogen atoms of the alinine peptides. In the case of the proteins, the radius has been augmented to 3.4 Å (i.e. by the hydrodynamic radius of water). This increase of the effective hydrodynamic radius is usually necessary for proteins exposing large, high hydrophilic areas to the solvent. Available in the literature is the experimental determination of the isotropic part of the diffusion tensor for the proteins. In particular for GB3, BPTI, PB1, LYS, and THR values of D^{exp}_{iso} are, respectively, 5.5 × 10^{7} Hz,^{12} 4.3 × 10^{7} Hz,^{9,11,12} 1.9 × 10^{7} Hz,^{22} 2.3 × 10^{7} Hz,^{9,11,12} and 1.0 × 10^{7} Hz.^{23}
Table 3 shows the comparison between the calculated values from MD with our protocol and with the standard approach based on rotational correlation functions with the experimental values. An overall good agreement with experiments is obtained via the analysis of angular momentum autocorrelation functions, within a factor between 1 and 1.5. The TIP3P model for water molecules has been employed, even if this model reproduces better the bulk properties of water,^{15,16} rotational diffusion is not always as well reproduced as with the angular momentum based approach. For the case of thrombin a smaller value of the calculated diffusion tensor with respect to the experiment is found. However, we note that both calculations, either based on rotation correlation functions or angular momentum relaxation provide a similar estimate of the diffusion tensor.
The assumption that the inertia and friction/diffusion tensors are approximately collinear has been tested as shown in Fig. 3 where we compare for the 10-alanine peptide and thrombin the orientation of the principal axes of inertia (MF, depicted in red, in the leftmost panels of the figure) with the orientation of the principal values of the diffusion tensor (depicted in green, in the rightmost panels).
From this analysis we can thus state that the main source of the deviation from the mono exponential decay of the angular momentum autocorrelation functions is the non-rigidity of the protein. However, a fit with a simple mono-exponential or bi-exponential function is able to extract most of the information regarding the relaxation of the total angular momentum, providing a very good estimate of the friction/diffusion tensor from very short MD trajectories.
Finally, the stability of the angular momentum based method, with respect to the length of the trajectory, has been tested. For the GB3, BPTI, and LYS proteins results have been compared with the calculation performed with half of the trajectory. The data is reported in the ESI.† It is shown that for the smaller proteins the effect of the trajectory length is more important. A test has been conducted also on the newly included data for thrombin for which the 3 ns short, production MD, was divided in 6 portions of 0.5 ns each. Using only a 0.5 ns portion or the whole trajectory is changing the results <5%, which is comparable with the errors of the fitting procedure, as shown in Table 3. Thus, from our tests, we can say that the precision of our method can be safely set to <10% error.
Molecule | D ^{exp}_{iso}/10^{7} Hz | D ^{L-MD}_{iso} ^{} ^{ }/10^{7} Hz | D ^{R-MD}_{iso}/10^{7} Hz | D ^{DiTe}_{iso}/10^{7} Hz |
---|---|---|---|---|
a In parenthesis we report the error on the diffusion coefficient estimated by propagating the error on the correlation time, which comes from the fitting procedure. | ||||
2ALA | — | 516 (2.3%) | — | 442 |
5ALA | — | 85.5 (2.2%) | — | 113 |
10ALA | — | 57.1 (3.8%) | — | 69.0 |
GB3 | 5.5 | 7.9 (1.4%) | 4.7 | 5.3 |
BPTI | 4.3 | 5.4 (7.3%) | 3.6 | 4.7 |
PB1 | 1.9 | 2.5 (1.3%) | 1.7 | 1.9 |
LYS | 2.3 | 2.3 (2.3%) | 3.2^{12} | 2.5 |
THR | 1.1 | 0.49 (2.7%) | 0.25 | 1.1 |
A final remark has to be made about the calculation of the angular momentum based on numerical derivatives (eqn (10)). We tested the stability of the results changing the dumping frequency of the short MD trajectories of the proteins from 25 MD steps to 10 MD steps, without noticing differences in the results. This test is suggested in applying the protocol presented, since it takes a short time to be completed, unless the calculation of the angular momentum on MF is performed directly with the velocities obtained in output. In this case, since a roto-translation of the Cartesian coordinates is performed, the transformation of the Cartesian velocities requires that the angular velocity is calculated, as reported in ref. 24.
To conclude, the method presented here provides results of a quality comparable to both the standard analysis of MD trajectories based on rotational correlation functions, and the hydrodynamics-based approach. The latter is extremely fast and reliable, but suffers of the problem of the selection of the effective hydrodynamic radius. While a “standard” value of 2 Å represents a usually good starting point, sometimes the radius needs to be increased to account for interactions with water (e.g., dielectric friction). In these cases, the effective radius is augmented by the hydrodynamic radius of water if the protein is particularly hydrophilic (as was done in this work for the five proteins analysed). This is a ad hoc correction, which limits the predictive capacity of the hydrodynamic method. Besides, a further puzzling problem is the definition of the local viscosity. Usually, but not always, employing the bulk viscosity of the solvent provides good results. In addition, the hydrodynamic boundary conditions, coming from assumptions in the solution of the steady-state Navier–Stokes equations, do not rely directly on the atomistic description of the interaction between solute and solvent. Furthermore, in ordered phases like liquid crystals, the concept of bulk viscosity becomes much more complicated. For all these reasons, it is desirable to employ, when possible, MD simulations. Our method tries to retain the important molecular/atomistic information contained in MD simulations while trying to optimize the computational time of the dissipative properties by choosing conveniently the observable to be computed.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/c8cp04879g |
This journal is © the Owner Societies 2019 |