Dilipkumar N.
Asthagiri
*a,
Arjun
Valiya Parambathu
b and
Thomas L.
Beck
a
aOak Ridge National Laboratory, One Bethel Valley Road, Oak Ridge, TN 37830, USA. E-mail: asthagiridn@ornl.gov
bChemical and Biomolecular Engineering, University of Delaware, Newark, DE 19716, USA
First published on 28th March 2025
Earlier we showed that in the molecular dynamics simulation of a rigid model of water it is necessary to use an integration time-step δt ≤ 0.5 fs to ensure equipartition between translational and rotational modes. Here we extend that study in the NVT ensemble to NpT conditions and to an aqueous protein. We study neat liquid water with the rigid, SPC/E model and the protein BBA (PDB ID: 1FME) solvated in the rigid, TIP3P model. We examine integration time-steps ranging from 0.5 fs to 4.0 fs for various thermostat plus barostat combinations. We find that a small δt is necessary to ensure consistent prediction of the simulation volume. Hydrogen mass repartitioning alleviates the problem somewhat, but is ineffective for the typical time-step used with this approach. The compressibility, a measure of volume fluctuations, and the dielectric constant, a measure of dipole moment fluctuations, are also seen to be sensitive to δt. Using the mean volume estimated from the NpT simulation, we examine the electrostatic and van der Waals contribution to the hydration free energy of the protein in the NVT ensemble. These contributions are also sensitive to δt. In going from δt = 2 fs to δt = 0.5 fs, the change in the net electrostatic plus van der Waals contribution to the hydration of BBA is already in excess of the folding free energy reported for this protein.
Molecular dynamics simulations of water have a rich history. The seminal work by Rahman and Stillinger1,2 over half a century ago stands as a towering initial attempt to model the structure and dynamics of water. They described water as a rigid object, treating the translational motion using Cartesian coordinates and the rotational motion using Euler angles. Through careful analysis, they settled on a time-step δt = 0.4 fs for integrating the equations of motion. Some years later, Ryckaert, Ciccotti, and Berendsen introduced the SHAKE algorithm3 that enabled treating molecules such as water as a rigid object within a Cartesian coordinate system. SHAKE had the defect of not accounting for zero relative velocity of atoms connected by a rigid bond. This was fixed by the RATTLE algorithm.4 The subsequent development of the analytical SETTLE algorithm5 that obeys both the rigidity and velocity constraints for 3-site water molecules was another important innovation with significant impact in modeling water in bio-molecular simulations.
In molecular systems, the intra-molecular degrees are associated with stronger interaction energies or conversely higher frequency modes relative to the non-bonded, inter-molecular interactions. It is further the case that the higher frequency modes occur at time-scales that are well separated from the slower, inter-molecular dynamics. Thus, being able to freeze selected intra-molecular modes, as the constraint algorithms allow one to do, has the beneficial effect of removing from consideration modes that would otherwise require rather short time-steps to capture in a numerical scheme. Indeed, a key motivation in developing SHAKE, was the idea that “fast internal vibrations are usually decoupled from rotational and translational motions”.3 Thus, the thought was that if one could freeze the vibrational degrees of freedom, then one could take a longer time-step for describing the remaining slower rotational and translational motion. This reasoning has become central to modeling rigid-water molecular dynamics with time-steps that are considerably larger than the value used by Rahman and Stillinger. Most often, a value of 2 fs has been used in many studies over the last 4+ decades. It must then come as a surprise that recently we found that this assumption of decoupling of vibrations from rotations is not valid for water:6 the rotational relaxation in a fluid comprising rigid water molecules occurs on the same time-scales as the bond vibration and angle bending modes of water. (In water, the high frequency rotational motion is the librational motion in which the light hydrogen atoms wiggle about the axis passing through the heavier oxygen atom.) Thus, for water, taking a long time-step, assuming that “fast internal vibrations” have been frozen will still incur an error in describing the rotational relaxation, causing the breakdown of equipartition between translational and rotational motions. Ultimately, this breakdown is a consequence of not capturing both the translational and rotational relaxation with fidelity, as is required for obeying the fluctuation–dissipation relation for the respective modes.7 Our finding reconfirmed the breakdown of equipartition for bulk water noted in earlier works by Davidchack8 and Silveira and Abreu.9,10
For time-steps that lead to a breakdown of equipartition, we had found that the center of mass motion of water is at a higher temperature than the rotational motion about the center of mass. This suggested that for a constant volume simulation, the pressure must be higher for a larger δt. Indeed, Davidchack had found exactly that behavior. We reasoned that under NpT conditions, the volume must be higher for a higher δt. We validate this hypothesis in the present paper for both bulk water and for the designed protein with a ββα fold,11 hereafter termed protein BBA, solvated in water. For the bulk water simulation, the difference in volume between simulations with δt = 0.5 fs and δt = 2.0 fs can easily exceed typical volume changes in protein conformational change and protein unfolding. We also test the idea of hydrogen mass repartitioning,12 and find that while this helps somewhat, equipartition is still violated for the typical time-steps used with this approach.
We further reasoned that the isothermal compressibility and the static dielectric constant, properties that derive from the fluctuation of a related extensive quantity, should also depend on δt. The calculation of these properties from the appropriate fluctuation relationship requires rather long simulations for adequate convergence (for example, see ref. 13). Nevertheless, the dependence of these fluctuation quantities on δt can still be inferred from the simulations. For exploring the dependence of the dielectric constant on δt, we also study the electrostatic contribution to the free energy of hydration of a simple ion and a protein. For the simple ion, the charging free energy conforms to expectations based on the Born model and reveals the δt-dependence of the dielectric constant. For the protein BBA in water, both the electrostatic and van der Waals (vdW) contribution to hydration depend on δt. Importantly, in going from δt = 2.0 fs and δt = 0.5 fs, the change in the sum of the electrostatic and vdW contributions, an approximation to the net hydrophilic contribution in hydration, is comparable to the free energy of folding reported for BBA.14,15
In NAMD, the Lennard-Jones and real-space electrostatic interactions were cutoff at 9 Å. Long-range, analytical LJ corrections were applied. The electrostatic interactions were calculated using particle mesh Ewald with a grid spacing of 1 Å. The relative Ewald energy tolerance at the real-space cutoff was 10−7, tighter than the default in NAMD. The system was first equilibrated for 6 ns under NVT conditions at a mass density of 1.014 gm per cc (≈1.5% higher than the value noted for SPC/E).21 We used the canonical stochastic velocity rescaling thermostat (CSVR)22 to maintain the system at 298.15 K and the time-step in this phase was 0.5 fs. The equilibrated end-point configuration was used as the starting configuration in all subsequent studies.
We next equilibrated the system under NpT conditions for 8 × 106 steps followed by a production phase of 20 × 106 steps. We simulated using time-steps ranging from 0.5 fs to 4.0 fs in 0.5 fs intervals. The geometry of the water molecule was maintained using the default SETTLE algorithm. Simulation data (energies, volumes, etc.) were archived every 500 steps for further analysis.
Within Tinker, we used the default Ewald cutoff of 7 Å and the default Lennard-Jones cutoff of 9 Å. Long-range LJ corrections were applied. The system was simulated for a total of 28 × 106 steps as above, but data logging frequency changed with step size, being approximately every 1 ps (we say approximately since some time-steps do not evenly divide 1 ps). The geometry of water was maintained using the RATTLE algorithm. For simulations with Tinker, we also studied the effect of mass repartitioning12—the mass of the hydrogen atom in water was increased to 3.024 amu and the mass of oxygen appropriately reduced such that the mass of a water molecule remained at 18.0154 amu.
With NAMD, we experimented with the following thermostat-barostat combinations: Langevin thermostat/barostat,23 CSVR thermostat/Langevin barostat, and CSVR thermostat/Monte Carlo barostat.24,25 With Tinker, we used the Monte Carlo barostat and the CSVR thermostat. For the volume sampling frequency in the Monte Carlo barostat, we used the default values in the respective codes: 50 steps in NAMD and 25 steps in Tinker.
Within Tinker, for a limited set of runs, we experimented using the Beeman algorithm26 to learn if an improved estimate of on-step velocity affected the overall conclusions. These simulations were performed exclusively on CPUs.
Throughout, we use the Friedberg–Cameron approach27,28 to obtain statistical uncertainties for quantities such as the volume, the potential energy of the system, or the binding energy between a solute and the solvent (see below).
In the first set of simulations, no structural constraints were placed on the protein. The initial system was equilibrated under NpT conditions using δt = 0.5 fs for 4 ns (8 × 106 steps). The Lennard-Jones forces were smoothly switched to zero from 9.43 Å to 10.43 Å. The particle mesh Ewald method was used for long-range electrostatic interactions, and as above, a tighter tolerance was used for Ewald summations. The bond between a hydrogen and the parent heavy atom was made rigid using the RIGIDBONDS ALL flag in NAMD. In this phase of equilibration, we used the Langevin thermostat and barostat.
The configuration from the end-point of the equilibration run was used to launch simulations at time-steps from 0.5 fs to 3.5 fs in intervals of 0.5 fs. For these studies we experimented with the following thermostat and barostat combinations: Langevin thermostat/barostat and CSVR thermostat/Monte Carlo barostat. For each δt, the system was equilibrated over 12 × 106 steps and data collected over a subsequent 20 × 106 steps, with data logged every 500 steps. The simulation trajectory was archived every 1000 steps for further analysis.
Since we fix the conformation of the protein, we cannot use GPU-resident calculations within NAMD. (For the same reason, we cannot use the Monte Carlo barostat, as this is only available in the GPU-resident mode.) Thus, we used the CSVR thermostat and Langevin barostat to equilibrate the volume. Once the volume stabilized, we removed the barostat as well. The hydration free energy calculations were then performed in the NVT ensemble.
The electrostatic contribution to the free energy, μ(ex)elec, was obtained by a thermodynamic integration procedure using a three point Gauss–Legendre quadrature,39,40 with protein charges scaled by Specifically,
![]() | (1) |
At each λ point, the system was equilibrated for 7.5 × 106 steps and configurations archived every 500 steps in the subsequent production run of 7.5 × 106 steps. The PAIR INTERACTION approach in NAMD was used to calculate 〈ϕ〉λ.
The van der Waals (or nonpolar) contribution, μ(ex)vdW, to the free energy of hydration can be calculated from the quasichemical organization of the potential distribution theorem42,43 for a solute with all partial charges set to zero. Specifically,44
μ(ex)vdW = μ(ex)HC + kBT![]() ![]() ![]() | (2) |
![]() | (3) |
![]() | (4) |
In the NVT ensemble with conducting boundary conditions, the dielectric constant ε is given by51
![]() | (5) |
Fig. 1 shows that the volume increases with increasing δt, irrespective of the thermostat/barostat combination. This is consistent with our hypothesis based on observations in the earlier study that the translational temperature increases as δt increases, and Davidchack's calculation of pressure versus δt in the NVT ensemble. Clearly, only for a small δt—for the conditions tested here δt = 0.5 fs—the volumes (densities) converge to a common value that is independent of the thermostat/barostat combination. Importantly, the volumes (densities) converge to the value obtained from the entirely separate reference simulation. Lastly, hydrogen mass repartitioning (HMR) is an improvement in estimating the equilibrium volume (density), but the procedure is ineffective for δt = 4 fs. HMR can be defensible for δt ≤ 2.0 fs.
As a further check, we sought to compare with predictions of density from an entirely stochastic simulation. Sanz et al.21 have systematically explored the phase equilibrium of water for different water models using Monte Carlo calculations. For the SPC/E model and for a nearly identical treatment of inter-molecular interactions—they use a shorter cutoff of 8.5 Å versus 9 Å used by us—they quote a density of 1000 kg m−3 for 1 bar pressure and 300 K. Using the experimental thermal expansion coefficient for water, we can infer that at 298.15 K, the density from the Monte Carlo procedure should be about 1000.4 kg m−3. This value is an upper bound to the values obtained in the molecular dynamics simulations, with the least deviation of 0.2% of 998.5 kg m−3 obtained for δ = 0.5 fs. While this comparison is encouraging, we must note some caveats. First, Sanz et al. do not report the number of water molecules used in the simulations; we suspect52 it was considerably less than 4096, perhaps being as low as 360. As noted in our study on system size dependence of protein hydration,53 a larger system better accommodates density fluctuations and this may explain part of the deviation. (Exploring the relevance of this issue for converged density predictions is left for a separate study.) Second, Sanz et al. do not quote statistical uncertainties. We suspect52 it is about 1 kg m−3, in which case the agreement with our reference and δt = 0.5 fs results is satisfactory within the quoted statistical uncertainties of the respective simulations.
Consider next the change in the partial molar volume between δt = 0.5 fs and the more conventional δt = 2.0 fs. For simulations with the CSVR thermostat and Monte Carlo barostat, the partial molar volume for δt = 2.0 fs is about 0.02 Å3 larger. Thus for a system with 10000 water molecules, a system size that is nowadays rather common and likely on the smaller size-scale of simulation systems, the volume for the δt = 2.0 fs simulation will be larger by about 200 Å3 relative to that for the system simulated with δt = 0.5 fs. To put this deviation in perspective, we note the following example. The volume change upon folding of the 149 residue Staphylococcal Nuclease protein at 21 °C, close to the temperature studied here, is found to be about 70 ml mol−1 or about 116 Å3 per molecule (of S. Nuclease).54,55 This deviation is already smaller than the error in overall system volume induced by too large of a δt. Volume change upon folding/unfolding for similarly sized or smaller proteins will be comparable or smaller. Thus the impact of the artifacts due to too large a δt will be proportionally greater in assessing both the thermodynamics and the kinetics of the folding/unfolding transition in computer simulations.
Since the volume depends on δt, it begs the question whether the fluctuation in the volume under NpT conditions also depends on δt. Fig. 2 shows the behavior of the estimated compressibility. For this analysis, we only consider the CSVR thermostat/Monte Carlo barostat; in contrast to the Langevin thermostat the CSVR thermostat is less intrusive in the dynamics and affects translation and rotation symmetrically. (With the CSVR thermostat the average of the translation and rotation temperatures equals the thermostat set-point temperature, unlike what we find for the Langevin thermostat.6)
In Fig. 2 (left panel), the estimated statistical uncertainties are large, as expected given the overall length of the simulation after accounting for correlations; however, the trend is unmistakable: the compressibility tends to decrease with decreasing δt. The behavior of the compressibility with δt is also consistent with the behavior of the binding energy with δt (right panel): as cohesion increases one expects the fluid matrix to become stiffer and the compressibility to decrease.
Fig. 3 shows the calculated behavior of the dielectric constant and the behavior of βμ(ex)elec/q2. As seen in the calculation of κT, the estimated statistical uncertainties in the calculation of ε are large. The data suggests that the dielectric constant can be sensitive to δt, just as the compressibility is found to be sensitive to δt, with the value for δt = 0.5 fs being slightly greater than the value for δt = 4.0 fs. This behavior is physically consistent with our earlier finding (Fig. 1, ref. 6) that as δt increases, the rotational motion is at a lower temperature relative to that for the translational motion. To better probe the electrostatic response of the fluid, we consider the problem of charging an ion (Fig. 3, right panel). Within the continuum dielectric Born model of hydration, βμ(ex)elec/q2 ∝ −(1 − 1/ε). However, with an atom that also has non-electrostatic interactions with the molecular fluid, to better isolate the dependence on the dielectric constant we consider the free energy of charging, βΔμ(ex)elec/q2, relative to the value obtained using δt = 0.5 fs. The data shows that this difference quantity is roughly the same for both charge states, and consistent with our expectation, the dielectric constant is indicated to be higher for δt = 0.5 fs relative to δt = 4.0 fs. We find the same behavior in the more interesting case of the hydration of the protein BBA.
![]() | ||
Fig. 3 Left panel: Calculated dielectric constant versus δt. The uncertainty is shown at the 1σ level. Right panel: The free energy of charging a Lennard-Jones Na37 atom to hypothetical charge states of q = 2e and q = 3e, respectively, relative to the corresponding value obtained using δt = 0.5 fs. The standard error of the mean is shown at the 1σ level. |
Examining the dependence of the radius of gyration of the protein (Rg) on δt in carefully constructed replica exchange simulations proved to be inconclusive in revealing systematic trends (data not shown). This can be due to confounding roles of flexibility, distribution of the counterion (Cl−), and the potential for artifacts introduced by violation of canonical sampling.56 To better isolate the dependence of protein–solvent interactions on δt, we consider a protein in a fixed conformation in a water bath and without the counterion.
Table 1 shows the average equilibrium box length obtained from NpT calculations for this system. These values are used in the NVT ensemble calculations of μ(ex)elec and μ(ex)vdW.
δ t (fs) | V (Å3) | L (Å) | μ (ex)self (kcal mol−1) |
---|---|---|---|
a Standard error of the mean at the 1σ level.
b The Wigner self-interaction correction37,38 is ![]() |
|||
0.5 | 200![]() |
58.488 | 0.0 |
1.0 | 200![]() |
58.491 | 0.0 |
2.0 | 200![]() |
58.508 | 0.2 |
3.0 | 200![]() |
58.536 | 0.4 |
4.0 | 200![]() |
58.576 | 0.7 |
For the usual choice of δt = 2.0 fs relative to δt = 0.5 fs we can ignore the Wigner self-interaction correction for this system. Also, since the box is relatively large compared to the size of the protein, we ignore the finite-size correction to electrostatic interactions as well.
Fig. 5 shows μ(ex)elec and μ(ex)vdW of BBA. First, consider the electrostatic contribution. We find that the linear response result already gives ≈95% of the electrostatic contribution (data not shown), emphasizing that the three point rule is adequate in describing the charging free energy. Second, the behavior across δt shows that μ(ex)elec is sensitive to δt, even without including the Wigner self-interaction correction (Table 1) which amplify the trend even more. Finally, the vdW contribution obtained using the mean-field approximation μ(ex)vdW ≈ 〈ε〉 is also sensitive to the step-size, although, as expected, not as strongly as μ(ex)elec.
Shaw and coworkers15 had studied the folding/unfolding transition of the BBA protein in very long computer simulations. At 325 K, they have calculated a folding free energy of about 0.7 kcal mol−1. (That study used a time-step of 2.5 fs.) Wu and Shea have estimated a similar value at 323 K.14 Assuming the folding free energy of the BBA protein is in a similar range at 298.15 K as well, we can see that the error introduced by a larger step-size in the calculation of the hydration free energy is already considerably larger than the reported folding free energy.
The thermodynamics of hydration and the conformational properties of biological macromolecules in liquid water are important quantities that inform the design and development of forcefields for biomolecular simulations. Our results suggest that tuning a forcefield in a simulation that does not obey equipartition is likely to introduce errors into the very structure of the forcefield itself. How these issues influence the kinetics and thermodynamics in aqueous and bio-macromolecular systems is necessarily left for future studies.
The earlier work6 and the present show that in a molecular dynamics approach to sampling equilibrium ensembles, it is important to capture the relevant relaxation dynamics with fidelity. However, in molecular dynamics sampling—and, for simplicity, we consider the NVE ensemble—the relevant metrics are conserved quantities, the foremost being the energy of the system; fidelity of capturing relaxation dynamics does not appear explicitly. In this regard, note that in discrete Hamiltonian dynamics with time-reversible algorithms, the sampling is from a shadow Hamiltonian, (δt) = H + G(δt2), where H is the physical Hamiltonian.57–60 (The general form of G is unknown but is usually investigated by means of a series expansion.61) As a consequence, and within time-reversible algorithms, H is not conserved but oscillates about a mean value, with the size of oscillations proportional to δt2.60 This discretization error, or what can also be thought of as an exchange of “shadow” work with the system,62 is what impacts the relaxation. Thus, we see that an algorithm can be globally stable with a long δt, but that stability is not sufficient to ensure relaxation processes are correctly captured.
A further consequence of discrete Hamiltonian dynamics is that the particle velocity v ≠ p/m, where p is the momentum obtained using the “Velocity”-Verlet equation59 and m is its mass. Better estimates of v can be obtained and the temperature defined using the generalized equipartition theorem.63,64 This is an important point to consider. However, the key insight from the earlier work and the present one is the need to obey the fluctuation–dissipation theorem, i.e. the need to capture the temporal evolution of fluctuations7 or equivalently the underlying relaxation dynamics with fidelity in a MD sampling of ensembles.
For a molecular dynamics sampling of equilibrium ensembles, we suggest it would be prudent to study relevant velocity autocorrelations,6 or if that is too tedious, estimate quantities such as mean energy or volume for several different time-steps to ensure that time-step artifacts are under control. Such checks would be especially prudent before undertaking large-scale simulation campaigns.
The rapid angular motion in the dynamics of a fluid comprising rigid water molecules leads to the rotational relaxation occurring at a shorter time scale than the translation relaxation. This imposes a fundamental limit on how long a time-step one can use for correctly capturing the temporal evolution of fluctuations, as is required for adhering to the fluctuation–dissipation relation. A long time-step for integrating the equations of motion leads to an incorrect description of the temporal evolution of fluctuations in turn leading to the breakdown of equipartition. We find that for integrating the equations of motion for the rigid SPC/E and TIP3P models of water, a time-step of 0.5 fs is defensible, a value that is also close to what Rahman and Stillinger had suggested long ago from their analysis of the dynamics of a pair of water molecules.
The breakdown of equipartition leads to the translational modes being hotter than the rotational modes, a deviation that is amplified as the time-step is increased. As a consequence, at constant pressure, the volume is larger for a longer time-step. For a protein dissolved in water, the deviation in the partial molar volume will necessarily introduce uncontrolled p–V errors in the folding free energy landscape.
The δt artifact also impacts the interaction between the protein and solvent. For the BBA protein, the error introduced in the electrostatic plus vdW contribution to the hydration free energy can easily exceed the folding free energy. Thus, too long a time-step introduces errors in the interaction contribution, further obfuscating the resolution of the free energy landscape.
Footnote |
† Notice: This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (https://www.energy.gov/doe-public-access-plan). |
This journal is © The Royal Society of Chemistry 2025 |