Consequences of the failure of equipartition for the p – V behavior of liquid water and the hydration free energy components of a small protein

Dilipkumar N. Asthagiri; Arjun Valiya Parambathu; Thomas L. Beck

doi:10.1039/D4SC08437C

View PDF VersionPrevious ArticleNext Article

Open Access Article

This Open Access Article is licensed under a
Creative Commons Attribution 3.0 Unported Licence

DOI: 10.1039/D4SC08437C (Edge Article) Chem. Sci., 2025, 16, 7503-7512

Consequences of the failure of equipartition for the p–V behavior of liquid water and the hydration free energy components of a small protein†

Dilipkumar N. Asthagiri *^a, Arjun Valiya Parambathu ^b and Thomas L. Beck ^a
^aOak Ridge National Laboratory, One Bethel Valley Road, Oak Ridge, TN 37830, USA. E-mail: asthagiridn@ornl.gov
^bChemical and Biomolecular Engineering, University of Delaware, Newark, DE 19716, USA

Received 13th December 2024 , Accepted 3rd March 2025

First published on 28th March 2025

Abstract

Earlier we showed that in the molecular dynamics simulation of a rigid model of water it is necessary to use an integration time-step δ_t ≤ 0.5 fs to ensure equipartition between translational and rotational modes. Here we extend that study in the NVT ensemble to NpT conditions and to an aqueous protein. We study neat liquid water with the rigid, SPC/E model and the protein BBA (PDB ID: 1FME) solvated in the rigid, TIP3P model. We examine integration time-steps ranging from 0.5 fs to 4.0 fs for various thermostat plus barostat combinations. We find that a small δ_t is necessary to ensure consistent prediction of the simulation volume. Hydrogen mass repartitioning alleviates the problem somewhat, but is ineffective for the typical time-step used with this approach. The compressibility, a measure of volume fluctuations, and the dielectric constant, a measure of dipole moment fluctuations, are also seen to be sensitive to δ_t. Using the mean volume estimated from the NpT simulation, we examine the electrostatic and van der Waals contribution to the hydration free energy of the protein in the NVT ensemble. These contributions are also sensitive to δ_t. In going from δ_t = 2 fs to δ_t = 0.5 fs, the change in the net electrostatic plus van der Waals contribution to the hydration of BBA is already in excess of the folding free energy reported for this protein.

1 Introduction

Liquid water is the pre-eminent solvent for biological, geological, and chemical processes. Consistent with its pervasive role, it has been widely studied both experimentally and theoretically. In the theoretical and simulation context, modeling the intermolecular interactions and simulating liquid water has occupied a central place in the overall enterprise of computer simulation of materials.

Molecular dynamics simulations of water have a rich history. The seminal work by Rahman and Stillinger^1,2 over half a century ago stands as a towering initial attempt to model the structure and dynamics of water. They described water as a rigid object, treating the translational motion using Cartesian coordinates and the rotational motion using Euler angles. Through careful analysis, they settled on a time-step δ_t = 0.4 fs for integrating the equations of motion. Some years later, Ryckaert, Ciccotti, and Berendsen introduced the SHAKE algorithm³ that enabled treating molecules such as water as a rigid object within a Cartesian coordinate system. SHAKE had the defect of not accounting for zero relative velocity of atoms connected by a rigid bond. This was fixed by the RATTLE algorithm.⁴ The subsequent development of the analytical SETTLE algorithm⁵ that obeys both the rigidity and velocity constraints for 3-site water molecules was another important innovation with significant impact in modeling water in bio-molecular simulations.

In molecular systems, the intra-molecular degrees are associated with stronger interaction energies or conversely higher frequency modes relative to the non-bonded, inter-molecular interactions. It is further the case that the higher frequency modes occur at time-scales that are well separated from the slower, inter-molecular dynamics. Thus, being able to freeze selected intra-molecular modes, as the constraint algorithms allow one to do, has the beneficial effect of removing from consideration modes that would otherwise require rather short time-steps to capture in a numerical scheme. Indeed, a key motivation in developing SHAKE, was the idea that “fast internal vibrations are usually decoupled from rotational and translational motions”.³ Thus, the thought was that if one could freeze the vibrational degrees of freedom, then one could take a longer time-step for describing the remaining slower rotational and translational motion. This reasoning has become central to modeling rigid-water molecular dynamics with time-steps that are considerably larger than the value used by Rahman and Stillinger. Most often, a value of 2 fs has been used in many studies over the last 4+ decades. It must then come as a surprise that recently we found that this assumption of decoupling of vibrations from rotations is not valid for water:⁶ the rotational relaxation in a fluid comprising rigid water molecules occurs on the same time-scales as the bond vibration and angle bending modes of water. (In water, the high frequency rotational motion is the librational motion in which the light hydrogen atoms wiggle about the axis passing through the heavier oxygen atom.) Thus, for water, taking a long time-step, assuming that “fast internal vibrations” have been frozen will still incur an error in describing the rotational relaxation, causing the breakdown of equipartition between translational and rotational motions. Ultimately, this breakdown is a consequence of not capturing both the translational and rotational relaxation with fidelity, as is required for obeying the fluctuation–dissipation relation for the respective modes.⁷ Our finding reconfirmed the breakdown of equipartition for bulk water noted in earlier works by Davidchack⁸ and Silveira and Abreu.^9,10

For time-steps that lead to a breakdown of equipartition, we had found that the center of mass motion of water is at a higher temperature than the rotational motion about the center of mass. This suggested that for a constant volume simulation, the pressure must be higher for a larger δ_t. Indeed, Davidchack had found exactly that behavior. We reasoned that under NpT conditions, the volume must be higher for a higher δ_t. We validate this hypothesis in the present paper for both bulk water and for the designed protein with a ββα fold,¹¹ hereafter termed protein BBA, solvated in water. For the bulk water simulation, the difference in volume between simulations with δ_t = 0.5 fs and δ_t = 2.0 fs can easily exceed typical volume changes in protein conformational change and protein unfolding. We also test the idea of hydrogen mass repartitioning,¹² and find that while this helps somewhat, equipartition is still violated for the typical time-steps used with this approach.

We further reasoned that the isothermal compressibility and the static dielectric constant, properties that derive from the fluctuation of a related extensive quantity, should also depend on δ_t. The calculation of these properties from the appropriate fluctuation relationship requires rather long simulations for adequate convergence (for example, see ref. 13). Nevertheless, the dependence of these fluctuation quantities on δ_t can still be inferred from the simulations. For exploring the dependence of the dielectric constant on δ_t, we also study the electrostatic contribution to the free energy of hydration of a simple ion and a protein. For the simple ion, the charging free energy conforms to expectations based on the Born model and reveals the δ_t-dependence of the dielectric constant. For the protein BBA in water, both the electrostatic and van der Waals (vdW) contribution to hydration depend on δ_t. Importantly, in going from δ_t = 2.0 fs and δ_t = 0.5 fs, the change in the sum of the electrostatic and vdW contributions, an approximation to the net hydrophilic contribution in hydration, is comparable to the free energy of folding reported for BBA.^14,15

2 Methodology

2.1 Bulk water

We studied the SPC/E¹⁶ water model using both NAMD^17,18 and GPU-accelerated Tinker^19,20 codes. The system comprised 4096 water molecules. Throughout, the equations of motion were integrated using the Velocity Verlet algorithm.

In NAMD, the Lennard-Jones and real-space electrostatic interactions were cutoff at 9 Å. Long-range, analytical LJ corrections were applied. The electrostatic interactions were calculated using particle mesh Ewald with a grid spacing of 1 Å. The relative Ewald energy tolerance at the real-space cutoff was 10⁻⁷, tighter than the default in NAMD. The system was first equilibrated for 6 ns under NVT conditions at a mass density of 1.014 gm per cc (≈1.5% higher than the value noted for SPC/E).²¹ We used the canonical stochastic velocity rescaling thermostat (CSVR)²² to maintain the system at 298.15 K and the time-step in this phase was 0.5 fs. The equilibrated end-point configuration was used as the starting configuration in all subsequent studies.

We next equilibrated the system under NpT conditions for 8 × 10⁶ steps followed by a production phase of 20 × 10⁶ steps. We simulated using time-steps ranging from 0.5 fs to 4.0 fs in 0.5 fs intervals. The geometry of the water molecule was maintained using the default SETTLE algorithm. Simulation data (energies, volumes, etc.) were archived every 500 steps for further analysis.

Within Tinker, we used the default Ewald cutoff of 7 Å and the default Lennard-Jones cutoff of 9 Å. Long-range LJ corrections were applied. The system was simulated for a total of 28 × 10⁶ steps as above, but data logging frequency changed with step size, being approximately every 1 ps (we say approximately since some time-steps do not evenly divide 1 ps). The geometry of water was maintained using the RATTLE algorithm. For simulations with Tinker, we also studied the effect of mass repartitioning¹²—the mass of the hydrogen atom in water was increased to 3.024 amu and the mass of oxygen appropriately reduced such that the mass of a water molecule remained at 18.0154 amu.

With NAMD, we experimented with the following thermostat-barostat combinations: Langevin thermostat/barostat,²³ CSVR thermostat/Langevin barostat, and CSVR thermostat/Monte Carlo barostat.^24,25 With Tinker, we used the Monte Carlo barostat and the CSVR thermostat. For the volume sampling frequency in the Monte Carlo barostat, we used the default values in the respective codes: 50 steps in NAMD and 25 steps in Tinker.

Within Tinker, for a limited set of runs, we experimented using the Beeman algorithm²⁶ to learn if an improved estimate of on-step velocity affected the overall conclusions. These simulations were performed exclusively on CPUs.

Throughout, we use the Friedberg–Cameron approach^27,28 to obtain statistical uncertainties for quantities such as the volume, the potential energy of the system, or the binding energy between a solute and the solvent (see below).

2.1.1 Bulk reference. To provide a separate estimate of convergence of volume, we studied a larger 32 [thin space (1/6-em)]

768 water system with a time-step of δ_t = 0.25 fs. This system was obtained by replicating the 4096 water system twice in the x, y, and z directions, respectively. The simulation box length was set to 100 Å; the bulk density was ≈2% lower than the converged value we found with the δ_t = 0.5 fs simulations. We equilibrated this system in the NVT ensemble for 6 × 10⁶ time-steps using the CSVR thermostat. The equilibrated configuration was then used to launch four separate NpT ensemble simulations using the CSVR thermostat and Monte Carlo barostat. The volume sampling frequency for the barostat was 80, 120, 160, and 200 time-steps, respectively, for the four separate runs. The NpT simulations were equilibrated for 6.25 × 10⁶ steps and data collected over an additional 6.25 × 10⁶ steps. In reporting the data for this larger system, the means from the four separate runs are averaged and the standard error of the mean obtained using variance propagation rules.

2.2 Aqueous BBA

BBA (PDB ID: 1FME) is a 28 residue designed protein that adopts a ββα fold. This is a marginally stable protein derived from a parent zinc-finger template¹¹ sans the zinc ion. The first model from the PDB data file was taken and solvated in 6561 TIP3P²⁹ water molecules. The N- and C-termini were modeled in the ammonium and carboxylate forms, respectively. At pH 7.0, the protein has a net charge of +4e, where e is the elementary charge; the net charge of the protein was compensated by adding 4 chloride (Cl⁻) ions to the system. The protein was modeled using the CHARMM36m^30–33 forcefield (including CMAP corrections). CHARMM-modified parameters were used for TIP3P.³⁴ The initial structure was built using the PSFGEN tool³⁵ and the chloride ions were added using the autoionize tool within VMD.³⁶

In the first set of simulations, no structural constraints were placed on the protein. The initial system was equilibrated under NpT conditions using δ_t = 0.5 fs for 4 ns (8 × 10⁶ steps). The Lennard-Jones forces were smoothly switched to zero from 9.43 Å to 10.43 Å. The particle mesh Ewald method was used for long-range electrostatic interactions, and as above, a tighter tolerance was used for Ewald summations. The bond between a hydrogen and the parent heavy atom was made rigid using the RIGIDBONDS ALL flag in NAMD. In this phase of equilibration, we used the Langevin thermostat and barostat.

The configuration from the end-point of the equilibration run was used to launch simulations at time-steps from 0.5 fs to 3.5 fs in intervals of 0.5 fs. For these studies we experimented with the following thermostat and barostat combinations: Langevin thermostat/barostat and CSVR thermostat/Monte Carlo barostat. For each δ_t, the system was equilibrated over 12 × 10⁶ steps and data collected over a subsequent 20 × 10⁶ steps, with data logged every 500 steps. The simulation trajectory was archived every 1000 steps for further analysis.

2.2.1 Hydration free energy components of BBA. For calculating the hydrophilic contributions to the hydration free energy of BBA, we made two important changes. First, we fixed the protein conformation. Second, we removed the Cl⁻ ions (within the Ewald formulation, the uniform compensating background charge ensures the electroneutrality of the system^37,38). These changes were made to ensure that protein conformational fluctuations or variation in the binding of Cl⁻ ions to the protein do not obfuscate the role of δ_t. The initial protein conformation was obtained by scanning the NpT simulation (using the CSVR thermostat/MC barostat and δ_t = 0.5 fs) for the conformer with the least deviation from the reference 1FME conformer. The RMS deviation of the chosen structure relative to the original 1FME structure was 1.55 Å.

Since we fix the conformation of the protein, we cannot use GPU-resident calculations within NAMD. (For the same reason, we cannot use the Monte Carlo barostat, as this is only available in the GPU-resident mode.) Thus, we used the CSVR thermostat and Langevin barostat to equilibrate the volume. Once the volume stabilized, we removed the barostat as well. The hydration free energy calculations were then performed in the NVT ensemble.

The electrostatic contribution to the free energy, μ^(ex)_elec, was obtained by a thermodynamic integration procedure using a three point Gauss–Legendre quadrature,^39,40 with protein charges scaled by Specifically,


	(1)

where 〈ϕ〉_λ is the electrostatic contribution to the interaction energy between the protein and the solvent with configurations sampled from the ensemble with charges scaled by λ; w_λ is the weight associated with the sampling point λ; and Q = +4e is the net charge of the protein. (N.B. λ = 0.5 gives the linear-response estimate of μ^(ex)_elec.) To complete the calculation, it is necessary to consider (Wigner) self-interaction^37,38 and finite-size⁴¹ corrections. These scale with the length, L, of the cubic simulation box as 1/L and (R/L)² × 1/L, respectively, where R is the nominal radius of the protein. Although the volume changes with δ_t, as discussed below, the impact on the change in the Wigner self interaction contribution proves to be small, especially between δ_t = 0.5 fs and δ_t = 2.0 fs. Further, since the protein occupies a small volume of the simulation cell, we ignore finite-size corrections as well.

At each λ point, the system was equilibrated for 7.5 × 10⁶ steps and configurations archived every 500 steps in the subsequent production run of 7.5 × 10⁶ steps. The PAIR INTERACTION approach in NAMD was used to calculate 〈ϕ〉_λ.

The van der Waals (or nonpolar) contribution, μ^(ex)_vdW, to the free energy of hydration can be calculated from the quasichemical organization of the potential distribution theorem^42,43 for a solute with all partial charges set to zero. Specifically,⁴⁴


μ^(ex)_vdW = μ^(ex)_HC + k_BTlnp(n = 0) + k_BTln∫P(ε\|n = 0)e^βεdε,	(2)

where μ^(ex)_HC is the hard-core (or packing) contribution to the hydration from a solute that simply excludes the solvent from a volume comprising the solute plus a defined inner shell; P(ε|n = 0) is the probability distribution of the binding energy of the solute with the solvent, subject to the inner-shell being bereft (n = 0) of solvent; p(n = 0) is the associated marginal distribution; and β = 1/k_BT is the reciprocal temperature in energy units. A rigorous calculation of the terms in the above equation has been presented in the past,^44–48 but such calculations are demanding and require supplying external forces to NAMD using the Tcl interface. In ref. 6 we have already established that for a small cavity, μ^(ex)_HC is sensitive to δ_t. Since our interest here is mainly to detect the role of δ_t in the protein–solvent interaction energy, we adopted the following procedure that will be accessible to most users of simulation codes. We completely ignored the excluded volume contribution and considered the molecular envelope as the inner shell, thus p(n = 0) = 1. We then computed the binding energy distribution, P(ε), of the protein with the solvent. If P(ε) is Gaussian with mean 〈ε〉 and variance σ², and subject to the aforementioned simplifications, the vdW contribution, μ^(ex)_vdW, is given by^43,44


	(3)

We find that the difference in μ^(ex)_vdW between adjacent δ_t values is insensitive to whether or not we include the fluctuation contribution (βσ²/2). Hence, for further simplicity, we adopt the mean-field approximation μ^(ex)_vdW ≈ 〈ε〉.

2.3 Fluctuation properties

The isothermal compressibility, κ_T, is given by


	(4)

where

is the second central moment of the distribution of V obtained in the simulations. If the sample of V obtained from the simulation is independent, identically distributed (iid), then we know that σ_V is χ² distributed⁴⁹ with the optimal value of σ_V given by

where σ₀ is the sample standard deviation of V and N is the sample size. However, the time-series trace of volumes in the simulation log is correlated. To this end, we compute the autocorrelation of δ_V = V − 〈V〉, and define the correlation length as the number of entries, n, in the log-file it takes for the normalized autocorrelation of δ_V to fall below 0.05. (An alternative choice for the correlation length is the statistical inefficiency from the Friedberg–Cameron approach; this choice is slightly tighter, but it still leads to the same mean κ_T and similar uncertainties.) We then sub-sample the time-series trace of V, such that the sampled V are separated by the auto-correlation length. For this sub-sample, we have

(See also ref. 50.) By shifting the time origin, we construct n − 1 such sub-samples and compose the mean σ²_V. The error of the mean σ²_V is then obtained using variance propagation rules.

In the NVT ensemble with conducting boundary conditions, the dielectric constant ε is given by⁵¹


	(5)

where M is the dipole moment of the simulation system; for an adequate sample size, 〈M〉 ≈ 0. By expressing the equation in x, y, z components we follow the procedure used for calculating κ_T to calculate ε and its uncertainty. For calculating ε we used the average system size obtained using the MC barostat/CSVR thermostat with a time-step of 0.5 fs. We use the CSVR thermostat with 0.5 ps stochastic rescaling period and a production run of 50 × 10⁶ time-steps with configurations saved every 500 steps for analysis. We sought a second way to assess the impact of δ_t on the dielectric constant, namely by calculating the free energy of charging, μ^(ex)_elec, a Na atom³⁷ to hypothetical charge states of +2e and +3e. The higher charge states help amplify the effects we are after. For these charging calculations, we add the ion to the simulation cell assuming the partial molar volume is zero, and rely on a linear-response approximation (λ = 0.5 noted above). The equilibration phase was for 5 × 10⁶ steps and the production phase lasted 40 × 10⁶ steps.

3 Results and discussion

3.1 Bulk water

Fig. 1 shows the volume (density) versus δ_t for different thermostat/barostat combinations. The horizontal line in the figure is the value obtained from averaging the four separate simulations for the large reference system. We find that by between 1250 steps, for a volume sampling frequency of 80, and 2500 steps, for a sampling frequency 200, the system volume settles close to the eventual mean value. This also helps confirm that the sample sizes for the reference calculation and the studies with the 4096 water system are rather conservative.


	Fig. 1 Water partial molar volume v (left panel) and mass density (right panel) versus δ_t for simulations with thermostat set at 298.15 K and barostat set at 1 bar. The simulations with Tinker use the CSVR thermostat and the Monte Carlo barostat. The standard error of the mean is shown at the 2σ level. Reference: 32768 waters, δ_t = 0.25 fs. Data for different volume sampling frequencies are averaged and standard errors appropriately propagated. The radius of the symbol (•) is 2σ standard error of the mean.

Fig. 1 shows that the volume increases with increasing δ_t, irrespective of the thermostat/barostat combination. This is consistent with our hypothesis based on observations in the earlier study that the translational temperature increases as δ_t increases, and Davidchack's calculation of pressure versus δ_t in the NVT ensemble. Clearly, only for a small δ_t—for the conditions tested here δ_t = 0.5 fs—the volumes (densities) converge to a common value that is independent of the thermostat/barostat combination. Importantly, the volumes (densities) converge to the value obtained from the entirely separate reference simulation. Lastly, hydrogen mass repartitioning (HMR) is an improvement in estimating the equilibrium volume (density), but the procedure is ineffective for δ_t = 4 fs. HMR can be defensible for δ_t ≤ 2.0 fs.

As a further check, we sought to compare with predictions of density from an entirely stochastic simulation. Sanz et al.²¹ have systematically explored the phase equilibrium of water for different water models using Monte Carlo calculations. For the SPC/E model and for a nearly identical treatment of inter-molecular interactions—they use a shorter cutoff of 8.5 Å versus 9 Å used by us—they quote a density of 1000 kg m⁻³ for 1 bar pressure and 300 K. Using the experimental thermal expansion coefficient for water, we can infer that at 298.15 K, the density from the Monte Carlo procedure should be about 1000.4 kg m⁻³. This value is an upper bound to the values obtained in the molecular dynamics simulations, with the least deviation of 0.2% of 998.5 kg m⁻³ obtained for δ = 0.5 fs. While this comparison is encouraging, we must note some caveats. First, Sanz et al. do not report the number of water molecules used in the simulations; we suspect⁵² it was considerably less than 4096, perhaps being as low as 360. As noted in our study on system size dependence of protein hydration,⁵³ a larger system better accommodates density fluctuations and this may explain part of the deviation. (Exploring the relevance of this issue for converged density predictions is left for a separate study.) Second, Sanz et al. do not quote statistical uncertainties. We suspect⁵² it is about 1 kg m⁻³, in which case the agreement with our reference and δ_t = 0.5 fs results is satisfactory within the quoted statistical uncertainties of the respective simulations.

Consider next the change in the partial molar volume between δ_t = 0.5 fs and the more conventional δ_t = 2.0 fs. For simulations with the CSVR thermostat and Monte Carlo barostat, the partial molar volume for δ_t = 2.0 fs is about 0.02 Å³ larger. Thus for a system with 10 [thin space (1/6-em)] 000 water molecules, a system size that is nowadays rather common and likely on the smaller size-scale of simulation systems, the volume for the δ_t = 2.0 fs simulation will be larger by about 200 Å³ relative to that for the system simulated with δ_t = 0.5 fs. To put this deviation in perspective, we note the following example. The volume change upon folding of the 149 residue Staphylococcal Nuclease protein at 21 °C, close to the temperature studied here, is found to be about 70 ml mol⁻¹ or about 116 Å³ per molecule (of S. Nuclease).^54,55 This deviation is already smaller than the error in overall system volume induced by too large of a δ_t. Volume change upon folding/unfolding for similarly sized or smaller proteins will be comparable or smaller. Thus the impact of the artifacts due to too large a δ_t will be proportionally greater in assessing both the thermodynamics and the kinetics of the folding/unfolding transition in computer simulations.

Since the volume depends on δ_t, it begs the question whether the fluctuation in the volume under NpT conditions also depends on δ_t. Fig. 2 shows the behavior of the estimated compressibility. For this analysis, we only consider the CSVR thermostat/Monte Carlo barostat; in contrast to the Langevin thermostat the CSVR thermostat is less intrusive in the dynamics and affects translation and rotation symmetrically. (With the CSVR thermostat the average of the translation and rotation temperatures equals the thermostat set-point temperature, unlike what we find for the Langevin thermostat.⁶)


	Fig. 2 Left panel: Calculated compressibility versus δ_t. The uncertainty is shown at the 1σ level. Right panel: The binding energy of a water molecule averaged over all frames. The standard error of the mean is shown at the 2σ level. The NpT calculations are performed with the CSVR thermostat and the Monte Carlo barostat within the NAMD program.

In Fig. 2 (left panel), the estimated statistical uncertainties are large, as expected given the overall length of the simulation after accounting for correlations; however, the trend is unmistakable: the compressibility tends to decrease with decreasing δ_t. The behavior of the compressibility with δ_t is also consistent with the behavior of the binding energy with δ_t (right panel): as cohesion increases one expects the fluid matrix to become stiffer and the compressibility to decrease.

Fig. 3 shows the calculated behavior of the dielectric constant and the behavior of βμ^(ex)_elec/q². As seen in the calculation of κ_T, the estimated statistical uncertainties in the calculation of ε are large. The data suggests that the dielectric constant can be sensitive to δ_t, just as the compressibility is found to be sensitive to δ_t, with the value for δ_t = 0.5 fs being slightly greater than the value for δ_t = 4.0 fs. This behavior is physically consistent with our earlier finding (Fig. 1, ref. 6) that as δ_t increases, the rotational motion is at a lower temperature relative to that for the translational motion. To better probe the electrostatic response of the fluid, we consider the problem of charging an ion (Fig. 3, right panel). Within the continuum dielectric Born model of hydration, βμ^(ex)_elec/q² ∝ −(1 − 1/ε). However, with an atom that also has non-electrostatic interactions with the molecular fluid, to better isolate the dependence on the dielectric constant we consider the free energy of charging, βΔμ^(ex)_elec/q², relative to the value obtained using δ_t = 0.5 fs. The data shows that this difference quantity is roughly the same for both charge states, and consistent with our expectation, the dielectric constant is indicated to be higher for δ_t = 0.5 fs relative to δ_t = 4.0 fs. We find the same behavior in the more interesting case of the hydration of the protein BBA.


	Fig. 3 Left panel: Calculated dielectric constant versus δ_t. The uncertainty is shown at the 1σ level. Right panel: The free energy of charging a Lennard-Jones Na³⁷ atom to hypothetical charge states of q = 2e and q = 3e, respectively, relative to the corresponding value obtained using δ_t = 0.5 fs. The standard error of the mean is shown at the 1σ level.

3.2 Aqueous BBA

Fig. 4 (left panel) shows the partial molar volume of water versus δ_t for the aqueous protein system. Just as we found for bulk water, the simulation volume converges to a value independent of the thermostat/barostat combination only for a small δ_t. For the conditions tested here, δ_t = 0.5 fs ensures convergence of volume. It is heartening that this value of δ_t also ensure the convergence of the system potential energy to a common value.


	Fig. 4 Left panel: Dependence of partial molar volume v of water in the presence of one molecule of BBA. Right panel: Total mean potential energy of the system relative to the value for δ_t = 0.5 fs. Thermostat is set at 298.15 K and barostat is set at 1 bar. The standard error of the mean is shown at the 2σ level.

Examining the dependence of the radius of gyration of the protein (R_g) on δ_t in carefully constructed replica exchange simulations proved to be inconclusive in revealing systematic trends (data not shown). This can be due to confounding roles of flexibility, distribution of the counterion (Cl⁻), and the potential for artifacts introduced by violation of canonical sampling.⁵⁶ To better isolate the dependence of protein–solvent interactions on δ_t, we consider a protein in a fixed conformation in a water bath and without the counterion.

Table 1 shows the average equilibrium box length obtained from NpT calculations for this system. These values are used in the NVT ensemble calculations of μ^(ex)_elec and μ^(ex)_vdW.

Table 1 Equilibrated volumes and the corresponding box length of the simulation with the protein in a fixed conformation

δ _t (fs)	V (Å³)	L (Å)	μ ^(ex)_self (kcal mol⁻¹)
a Standard error of the mean at the 1σ level. b The Wigner self-interaction correction^37,38 is where q_i is the partial charge of atom i of the protein and ξ = −2.837297/L. We report this correction relative to the δ_t = 0.5 fs case.
0.5	200074 ± 13	58.488	0.0
1.0	200109 ± 9	58.491	0.0
2.0	200289 ± 7	58.508	0.2
3.0	200571 ± 6	58.536	0.4
4.0	200978 ± 6	58.576	0.7

For the usual choice of δ_t = 2.0 fs relative to δ_t = 0.5 fs we can ignore the Wigner self-interaction correction for this system. Also, since the box is relatively large compared to the size of the protein, we ignore the finite-size correction to electrostatic interactions as well.

Fig. 5 shows μ^(ex)_elec and μ^(ex)_vdW of BBA. First, consider the electrostatic contribution. We find that the linear response result already gives ≈95% of the electrostatic contribution (data not shown), emphasizing that the three point rule is adequate in describing the charging free energy. Second, the behavior across δ_t shows that μ^(ex)_elec is sensitive to δ_t, even without including the Wigner self-interaction correction (Table 1) which amplify the trend even more. Finally, the vdW contribution obtained using the mean-field approximation μ^(ex)_vdW ≈ 〈ε〉 is also sensitive to the step-size, although, as expected, not as strongly as μ^(ex)_elec.


	Fig. 5 Left panel: Dependence of electrostatic contribution to the free energy of hydration on δ_t. The Wigner self-interaction correction is not included. Right panel: δ_t dependence of mean-field van der Waals contribution to the hydration free energy. Throughout the numbers are shown relative to the mean value at δ_t = 0.5 fs. The standard error of the mean is shown at the 1σ level.

Shaw and coworkers¹⁵ had studied the folding/unfolding transition of the BBA protein in very long computer simulations. At 325 K, they have calculated a folding free energy of about 0.7 kcal mol⁻¹. (That study used a time-step of 2.5 fs.) Wu and Shea have estimated a similar value at 323 K.¹⁴ Assuming the folding free energy of the BBA protein is in a similar range at 298.15 K as well, we can see that the error introduced by a larger step-size in the calculation of the hydration free energy is already considerably larger than the reported folding free energy.

The thermodynamics of hydration and the conformational properties of biological macromolecules in liquid water are important quantities that inform the design and development of forcefields for biomolecular simulations. Our results suggest that tuning a forcefield in a simulation that does not obey equipartition is likely to introduce errors into the very structure of the forcefield itself. How these issues influence the kinetics and thermodynamics in aqueous and bio-macromolecular systems is necessarily left for future studies.

The earlier work⁶ and the present show that in a molecular dynamics approach to sampling equilibrium ensembles, it is important to capture the relevant relaxation dynamics with fidelity. However, in molecular dynamics sampling—and, for simplicity, we consider the NVE ensemble—the relevant metrics are conserved quantities, the foremost being the energy of the system; fidelity of capturing relaxation dynamics does not appear explicitly. In this regard, note that in discrete Hamiltonian dynamics with time-reversible algorithms, the sampling is from a shadow Hamiltonian, [H with combining tilde] (δ_t) = H + G(δ_t²), where H is the physical Hamiltonian.^57–60 (The general form of G is unknown but is usually investigated by means of a series expansion.⁶¹) As a consequence, and within time-reversible algorithms, H is not conserved but oscillates about a mean value, with the size of oscillations proportional to δ_t².⁶⁰ This discretization error, or what can also be thought of as an exchange of “shadow” work with the system,⁶² is what impacts the relaxation. Thus, we see that an algorithm can be globally stable with a long δ_t, but that stability is not sufficient to ensure relaxation processes are correctly captured.

A further consequence of discrete Hamiltonian dynamics is that the particle velocity v ≠ p/m, where p is the momentum obtained using the “Velocity”-Verlet equation⁵⁹ and m is its mass. Better estimates of v can be obtained and the temperature defined using the generalized equipartition theorem.^63,64 This is an important point to consider. However, the key insight from the earlier work and the present one is the need to obey the fluctuation–dissipation theorem, i.e. the need to capture the temporal evolution of fluctuations⁷ or equivalently the underlying relaxation dynamics with fidelity in a MD sampling of ensembles.

For a molecular dynamics sampling of equilibrium ensembles, we suggest it would be prudent to study relevant velocity autocorrelations,⁶ or if that is too tedious, estimate quantities such as mean energy or volume for several different time-steps to ensure that time-step artifacts are under control. Such checks would be especially prudent before undertaking large-scale simulation campaigns.

4 Conclusions

It has been nearly 50 years since Ryckaert, Ciccotti, and Berendsen introduced the SHAKE algorithm that allowed the efficient description of molecules such as water as a rigid object in computer simulations. A guiding idea in this development was the assumption that fast internal vibrations in the molecule are decoupled from translational and vibrational modes, and by describing the molecule as a rigid object, it is permissible to take a longer time-step to integrate the equations of motion. However, this chain of reasoning is not valid for water. In particular, in the simulated dynamics of rigid water molecules the rotational relaxation occurs on time-scales that are comparable to the internal vibrations in the physical water molecule. Ignoring this fact and simulating a liquid comprising rigid water molecules with a long time-will lead to an incorrect description of the rotational relaxation.

The rapid angular motion in the dynamics of a fluid comprising rigid water molecules leads to the rotational relaxation occurring at a shorter time scale than the translation relaxation. This imposes a fundamental limit on how long a time-step one can use for correctly capturing the temporal evolution of fluctuations, as is required for adhering to the fluctuation–dissipation relation. A long time-step for integrating the equations of motion leads to an incorrect description of the temporal evolution of fluctuations in turn leading to the breakdown of equipartition. We find that for integrating the equations of motion for the rigid SPC/E and TIP3P models of water, a time-step of 0.5 fs is defensible, a value that is also close to what Rahman and Stillinger had suggested long ago from their analysis of the dynamics of a pair of water molecules.

The breakdown of equipartition leads to the translational modes being hotter than the rotational modes, a deviation that is amplified as the time-step is increased. As a consequence, at constant pressure, the volume is larger for a longer time-step. For a protein dissolved in water, the deviation in the partial molar volume will necessarily introduce uncontrolled p–V errors in the folding free energy landscape.

The δ_t artifact also impacts the interaction between the protein and solvent. For the BBA protein, the error introduced in the electrostatic plus vdW contribution to the hydration free energy can easily exceed the folding free energy. Thus, too long a time-step introduces errors in the interaction contribution, further obfuscating the resolution of the free energy landscape.

Data availability

ASCII formatted simulation metadata and log files for all cases and the binary trajectory for BBA in water with δ_t = 0.5 fs (MC barostat) are available at DOI: 10.13139/OLCF/2480346. The data is also findable at doi.ccs.ornl.gov. The repository has a README.txt that can help in navigating the data.

Author contributions

D. N. A.—conceptualization, simulation, data analysis, and writing; A. V. P.—simulation (with Beeman algorithm), manuscript review; T. L. B.—research oversight, manuscript editing.

Conflicts of interest

There are no conflicts to declare.

Acknowledgements

We thank Margaret Whitt (NGSI Intern at ORNL) for exploratory studies on protein BBA. We thank Jindal Shah for helpful pointers on Monte Carlo simulations with the Cassandra code. We thank Thiago Pinheiro dos Santos for numerous helpful discussions. We thank David Rogers for a critical reading of the manuscript and helpful comments. This research used resources of the Oak Ridge Leadership Computing Facility an Advanced Scientific Computing Research (ASCR) facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

Notes and references

A. Rahman and F. H. Stillinger, J. Chem. Phys., 1971, 55, 3336–3359 CrossRef CAS.
F. H. Stillinger and A. Rahman, J. Chem. Phys., 1974, 60, 1545–1557 CrossRef CAS.
J. P. Ryckaert, G. Ciccotti and H. J. C. Berendsen, J. Comput. Phys., 1977, 23, 327–341 CrossRef CAS.
H. C. Andersen, J. Comput. Phys., 1983, 52, 24–34 CrossRef CAS.
S. Miyamoto and P. A. Kollman, J. Comput. Chem., 1992, 13, 952–962 CrossRef CAS.
D. N. Asthagiri and T. L. Beck, J. Chem. Theory Comput., 2024, 20, 368–374 CAS.
P. K. Pathria and P. D. Beale, Statistical Mechanics, Academic Press, 3rd edn, 2011 Search PubMed.
R. L. Davidchack, J. Comput. Phys., 2010, 229, 9323–9346 CrossRef CAS.
A. J. Silveira and C. R. A. Abreu, J. Chem. Phys., 2017, 147, 124104 CrossRef PubMed.
A. J. Silveira and C. R. A. Abreu, J. Chem. Phys., 2019, 150, 114110 Search PubMed.
B. I. Dahiyat and S. L. Mayo, Science, 1997, 278, 82–87 CAS.
C. W. Hopkins, S. Le Grand, R. C. Walker and A. E. Roitberg, J. Chem. Theory Comput., 2015, 11, 1864–1874 CAS.
R. Fuentes-Azcatl and J. Alejandre, J. Phys. Chem. B, 2014, 118, 1263–1272 CAS.
C. Wu and J.-E. Shea, PLoS Comput. Biol., 2010, 6, e1000998 Search PubMed.
K. Lindorff-Larsen, S. Piana, R. O. Dror and D. E. Shaw, Science, 2011, 334, 517–520 CAS.
H. J. C. Berendsen, J. R. Grigera and T. P. Straatsma, J. Phys. Chem., 1987, 91, 6269–6271 CAS.
L. Kale, R. Skeel, M. Bhandarkar, R. Brunner, A. Gursoy, N. Krawetz, J. Phillips, A. Shinozaki, K. Varadarajan and K. Schulten, J. Comput. Phys., 1999, 151, 283 CAS.
J. C. Phillips, D. J. Hardy, J. D. C. Maia, J. E. Stone, J. V. Ribeiro, R. C. Bernardi, R. Buch, G. Fiorin, J. Hénin and W. Jiang, et al. , J. Chem. Phys., 2020, 153, 044130 CAS.
J. A. Rackers, Z. Wang, C. Lu, M. L. Laury, L. Lagardere, M. J. Schnieders, J.-P. Piquemal, P. Ren and J. W. Ponder, J. Chem. Theory Comput., 2018, 14, 5273–5289 CAS.
Z. Wang and J. W. Ponder, Tinker9, 2023, https://github.com/TinkerTools/tinker9/ Search PubMed.
E. Sanz, C. Vega, J. L. Abascal and L. G. MacDowell, Phys. Rev. Lett., 2004, 92, 255701 CAS.
G. Bussi, D. Donadio and M. Parrinello, J. Chem. Phys., 2007, 126, 014101 Search PubMed.
S. E. Feller, Y. Zhang, R. W. Pastor and B. R. Brooks, J. Chem. Phys., 1995, 103, 4613–4621 CrossRef CAS.
K.-H. Chow and D. M. Ferguson, Comput. Phys. Commun., 1995, 91, 282–289 Search PubMed.
J. Åqvist, P. Wennerström, M. Nervall, S. Bjelic and B. O. Brandsdal, Chem. Phys. Lett., 2004, 384, 288–294 CrossRef.
D. Beeman, J. Comput. Phys., 1976, 20, 130–139 CrossRef.
R. Friedberg and J. E. Cameron, J. Chem. Phys., 1970, 52, 6049–6058 CrossRef CAS.
M. P. Allen and D. J. Tildesley, in Computer simulation of liquids, Oxford University Press, 1987, ch. 6. How to analyze the results, pp. 192–195 Search PubMed.
W. Jorgensen, J. Chandrasekhar, J. D. Madura, R. W. Impey and M. L. Klein, J. Chem. Phys., 1983, 79, 926–935 CrossRef CAS.
A. D. MacKerell Jr, D. Bashford, M. Bellott, R. L. Dunbrack Jr, J. D. Evanseck, M. J. Field, S. Fischer, J. Gao, H. Guo and S. Ha, et al. , J. Phys. Chem. B, 1998, 102, 3586–3616 Search PubMed.
A. D. MacKerell Jr, M. Feig and C. L. Brooks III, J. Comput. Chem., 2004, 25, 1400–1415 CrossRef PubMed.
R. B. Best, X. Zhu, J. Shim, P. E. M. Lopes, J. Mittal, M. Feig and A. D. MacKerell, Jr., J. Chem. Theory Comput., 2012, 8, 3257–3273 Search PubMed.
J. Huang, S. Rauscher, G. Nawrocki, T. Ran, M. Feig, B. L. De Groot, H. Grubmüller and A. D. Mackerell Jr, Nat. Methods, 2016, 14, 71–73 CrossRef PubMed.
E. Neria, S. Fischer and M. Karplus, J. Chem. Phys., 1996, 105, 1902–1921 CAS.
J. V. Ribeiro, B. Radak, J. Stone, J. Gullingsrud, J. Saam and J. Phillips, PSFGEN User's Guide, Version 2.0, University of Illinois and Beckman Institute Technical Report, 2020, https://www.ks.uiuc.edu/Research/vmd/plugins/psfgen/ug.pdf Search PubMed.
W. Humphrey, A. Dalke and K. Schulten, J. Mol. Graphics, 1996, 14, 33–38 CAS.
G. Hummer, L. R. Pratt and A. E. Garcia, J. Phys. Chem., 1996, 1206–1215 CAS.
G. Hummer, L. R. Pratt and A. E. Garcia, J. Phys. Chem. A, 1998, 102, 7885–7895 CrossRef CAS.
G. Hummer and A. Szabo, J. Chem. Phys., 1996, 105, 2004–2010 CrossRef CAS.
V. Weber and D. Asthagiri, J. Chem. Theory Comput., 2012, 8, 3409–3415 CAS.
G. Hummer, L. R. Pratt and A. E. García, J. Chem. Phys., 1997, 107, 9275–9277 CrossRef CAS.
M. E. Paulaitis and L. R. Pratt, Adv. Protein Chem., 2002, 62, 283–310 CAS.
T. L. Beck, M. E. Paulaitis and L. R. Pratt, The Potential Distribution Theorem and Models of Molecular Solutions, Cambridge University Press, Cambridge, UK, 2006 Search PubMed.
D. Asthagiri, H. S. Ashbaugh, A. Piryatinski, M. E. Paulaitis and L. R. Pratt, J. Am. Chem. Soc., 2007, 129, 10133–10140 CrossRef CAS PubMed.
D. Asthagiri, S. Merchant and L. R. Pratt, J. Chem. Phys., 2008, 128, 244512 Search PubMed.
D. S. Tomar, V. Weber and D. Asthagiri, Biophys. J., 2013, 105, 1482–1490 CAS.
D. S. Tomar, W. Weber, M. B. Pettitt and D. Asthagiri, J. Phys. Chem. B, 2016, 120, 69–76 CAS.
D. S. Tomar, M. E. Paulaitis, L. R. Pratt and D. N. Asthagiri, J. Phys. Chem. Lett., 2020, 11, 9965–9970 Search PubMed.
D. S. Sivia, Data Analysis: A Bayesian Tutorial, Oxford University Press, 1996 Search PubMed.
D. Frenkel and B. Smit, In Understanding Molecular Simulations: From Algorithms to Applications, Academic Press, 2002, ch. Appendix D Search PubMed.
M. Neumann, Mol. Phys., 1983, 50, 841–858 CAS.
J. L. F. Abascal and C. Vega, J. Chem. Phys., 2005, 123, 234505 CAS.
D. Asthagiri and D. S. Tomar, J. Phys. Chem. B, 2020, 124, 798–806 Search PubMed.
G. Panick, G. J. A. Vidugiris, R. Malessa, G. Rapp, R. Winter and C. A. Royer, Biochemistry, 1999, 38, 4157–4164 CAS.
A. Paliwal, D. Asthagiri, D. P. Bossev and M. E. Paulaitis, Biophys. J., 2004, 87, 3479–3492 CAS.
E. Rosta, N.-V. Buchete and G. Hummer, J. Chem. Theory Comput., 2009, 5, 1393–1399 CrossRef CAS.
J. M. Sanz-Serna, Acta Numer, 1992, 1, 243–286 CrossRef.
S. Toxvaerd, Phys. Rev. E, 1994, 50, 2271–2274 Search PubMed.
J. Gans and D. Shalloway, Phys. Rev. E, 2000, 61, 4587–4592 CrossRef CAS PubMed.
E. Hairer, C. Lubich and G. Wanner, Acta Numer, 2003, 12, 399–450 CrossRef.
J. M. Sanz-Serna and M. P. Calvo, Numerical Hamiltonian Problems, Dover Publications, Mineola, New York, 2018 Search PubMed.
D. A. Sivak, J. D. Chodera and G. E. Crooks, Phys. Rev. X, 2013, 3, 011007 CAS.
M. P. Eastwood, K. A. Stafford, R. A. Lippert, M. Ø. Jensen, P. Maragakis, C. Predescu, R. O. Dror and D. E. Shaw, J. Chem. Theory Comput., 2010, 6, 2045–2058 CrossRef CAS PubMed.
R. C. Tolman, Phys. Rev., 1918, 11, 261–275 CrossRef CAS.

Footnote

† Notice: This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (https://www.energy.gov/doe-public-access-plan).

Click here to see how this site uses Cookies. View our privacy policy here.