Ruiqi
Gao
a,
Yifan
Li
b and
Roberto
Car
*b
aDepartment of Electrical and Computer Engineering, Princeton University, Princeton, USA
bDepartment of Chemistry, Princeton University, Princeton, USA. E-mail: rcar@princeton.edu
First published on 23rd August 2024
In molecular simulations, neural network force fields aim at achieving ab initio accuracy with reduced computational cost. This work introduces enhancements to the Deep Potential network architecture, integrating a message-passing framework and a new lightweight implementation with various improvements. Our model achieves accuracy on par with leading machine learning force fields and offers significant speed advantages, making it well-suited for large-scale, accuracy-sensitive systems. We also introduce a new iterative model for Wannier center prediction, allowing us to keep track of electron positions in simulations of general insulating systems. We apply our model to study the solvated electron in bulk water, an ostensibly simple system that is actually quite challenging to represent with neural networks. Our trained model is not only accurate, but can also transfer to larger systems. Our simulation confirms the cavity model, where the electron's localized state is observed to be stable. Through an extensive run, we accurately determine various structural and dynamical properties of the solvated electron.
There has been a lot of development of ML force fields over the years.1–11 With the popularity of accuracy benchmarking, more recent models7,10,11 generally follow a trend of increasing accuracy at the cost of increasing model and computational complexity. However, in MD simulations, equilibrium and dynamical properties may require timescales of nanoseconds or even microseconds, corresponding to millions to billions of steps for sufficiently large system sizes. To this end, more lightweight and faster models are required. Earlier models like the Behler–Parrinello neural network (BPNN)1 and Deep Potential (DP)4,12–15 model are relatively small and fast, but may be inadequate for accuracy-sensitive systems.
This work is focused on developing a model that runs fast while being accurate enough for MD simulations. It is based on the DP model and we have made various enhancements to it. The most important is the incorporation of a message passing (MP) mechanism, so we call it DP-MP. This enables a richer representation that learns features on top of features, and also effectively increases DP's cutoff radius of the local receptive field. Using most of the building blocks of the existing DP model, we propagate both scalar and vector features for each atom and retain the model's invariance to translation, rotation, and permutation. We also incorporate second-order tensor information in the final features. These enhancements are designed to significantly boost the accuracy of DP without incurring much computational cost.
To make it faster and more flexible, we implement the new scheme with JAX,16 a Python-based autograd and machine learning framework that is optimized on GPUs.† The MD part can be seamlessly connected with frameworks like JAX-MD,17 enabling an end-to-end GPU workflow in Python. We perform a simple benchmark on a water system. Combined with the new implementation gains, the new model is around two orders of magnitude faster than other models achieving similar accuracy.
Additionally, in this work, we also present a new method for the prediction of the Wannier centers, i.e., the centers of maximally localized Wannier distributions.18 Wannier centers can be seen as representing the centers of the charge associated to individual electrons. So far, the Wannier centers can be predicted by a similar neural network like the DP model, which is called the Deep Wannier (DW) model.19 But this scheme is limited to systems where the Wannier centers can be uniquely associated with individual atoms, which precludes modeling electron transfer processes. In the present approach, we encapsulate a prediction model in an iterative refinement process, and it is called DWIR (Deep Wannier Iterative Refinement). With DWIR, one can keep track of the electrons in an atomic simulation, even when they are not uniquely associated to individual atoms.
To illustrate the capabilities of our enhanced models, we apply them to the study of e−(aq), the solvated electron in bulk water. e−(aq) plays an important role in radiation chemistry and biology,20 and despite its apparent simplicity, it has undergone much research effort before the cavity model became well-established: The electron creates a localized quasi-spherical cavity with a shell of surrounding water molecules.20,21 This system poses considerable challenges for ML models since they only see the atoms and not the excess electron, and the structure is quite complex and sensitive compared to bulk water. There have been efforts to learn an ML model of e−(aq),22 but it remains difficult to obtain a sufficiently accurate and robust model.22,23 Also, there has not been a model that can be transferred to larger systems, which is actually a requirement for many applications and technically possible given the localized nature of the electron.
In this work, we perform a DFT simulation of a periodic box of 128 H2O molecules plus one e−, and use the DP-MP scheme to successfully learn a model of e−(aq). We demonstrate its transferability to a larger system of 256 H2O molecules and one e−. In the DFT calculations, we adopt a hybrid functional PBEh(α) with 40% exact exchange and rVV10 van der Waals correction, which has been suggested to be able to reproduce well several experimental properties for water and e−(aq).21,24 We perform a nanosecond-long DP-MP run to collect sufficiently converged statistics, and learn an additional DWIR model to track the position of e−(aq). We calculate various structural and dynamical properties including the size, radial distribution functions, and diffusion mechanism. Our calculation confirms the cavity model and the stability of the localized state. We also identify a form of H–e− bond around the electron that is similar to the H-bond in water, and whose forming and breaking gives rise to the rapid diffusion of e−(aq).
1. Compute a smooth function s(rij) that approximates , except that it is modified to become zero when rij ≥ rc.
2. An embedding neural network G takes each s(rij) as input and outputs the feature G(s(rij)), a vector of length M1.
3. Average over neighboring atoms to obtain a scalar (T(1)) and vector (T(3)) feature of length M1 for each atom i:
4. Obtain an invariant feature Di of size M1M2: this is done by taking a subset of M2 (<M1) “axis” features from T, and for each m1 in {1,…,M1} and m2 in the subset we compute
5. Apply a fitting neural network F that yields the atomic energy
The model designed this way preserves the translational, rotational, and permutational symmetry of the energy function. All trainable parameters lie in the embedding network G and fitting network F, which are both multi-layer fully-connected residual networks (ResNets).25 In practice, depending on the chemical species, an embedding network is trained for each type-pair of (i,j) and a fitting network is trained for each type of i. But for simplicity we omit the atomic type in the formulas.
The calculation of the embedding network in step 2 is performed for pairs of atoms, making it the most time-consuming step. Thankfully, the input s(rij) is one dimensional, so it can be approximated by a piecewise polynomial at inference time, or referred to as compressed.26 With DP's simple design as well as compression, it is very fast compared to recent ML force fields.
DP has found many successful applications in systems like water, silicon, metals, metal oxides and so on, and has been applied to many studies including the phase diagram, and processes involved in crystal nucleation, combustion, interfacial systems etc.27–32 However, being a simple model, the expressive power of DP is somewhat limited. In addition, in more complex systems such as e−(aq), the radius of its influence most probably extends beyond the usual cutoff where DP is seen to perform well (like 6 Å in water). This brings us to the enhanced design, described in the next subsection.
Fig. 1 Architecture of DP-MP. The blue parts indicate the message passing steps, and the green parts indicate the final loop. |
featij = concatenate (G(s(rij)),Di,Dj, 〈T(3)i,rij〉3, 〈T(3)j,rij〉3)
as the new input to the embedding network. |
This process can be iterated: after each embedding network pass, we aggregate features from neighbors by step 3. We obtain new T and D atomic features from step 3 and 4, which are used in the input to the new embedding pass starting from step 2. After a few loops we can terminate and enter the previous fitting process described in step 5.
At the final loop at step 3, a slightly different feature set is employed: instead of T(1)i and T(3)i, we use T(3)i and T(6)i, where T(6)i is a set of 6-vectors defined by
There are certain implementation details that we did not dive into for the sake of clarity. Firstly, in calculating s(rij), we use a slightly simpler function than the original DP
The previous DP model has been used to predict the Wannier centroid, defined as the average position of WCs associated with a certain atom. For example, in an H2O molecule, there are 4 WCs associated with it. Each one of them represents a pair of electrons with opposite spin, with two of them for the bonding pairs and two for the lone pairs. The dipole moment is determined by the average of the 4 WCs or the Wannier centroid. The Wannier centroid obtained from DFT calculations can be learned by a separate neural network in DP, sometimes called the Deep Wannier (DW) model.19 Compared to the standard DP model which predicts a scalar energy for each atom and then sums them up, DW predicts a vector for each Oxygen atom, representing its relative position to the Wannier centroid. This is achieved by modifying the final step 5 where one changes the fitting network's output F(Di) to be a feature of length M1, and the relative displacement from the i-th oxygen atom is expressed by 〈F(Di),T(3)i〉M1.
DW works well on predicting Wannier centroids in water, or more generally, insulating systems where you can assign the WCs to individual atoms. However, in general, WCs are not unambiguously associated with certain atoms. Examples include e−(aq), as well as more complex reactions involving electron transfer. Still, WCs are functions of the atomic coordinates under the Born–Oppenheimer approximation. This calls for a new scheme to predict the WCs without anchoring them to given atoms.
Here we introduce the new Deep Wannier iterative refinement (DWIR) model (Fig. 3). The idea is simple: now the WCs are anchored to themselves, and we predict only a correction displacement on top of a given prediction. Suppose the atomic coordinates are {ri}Ni=1, and we have some initial guess of the WCs , where Nw is the total number of WCs. The initial guess is subject to errors, but we use a DW-like model to correct it iteratively:
w(k+1)j = w(k)j + δw(k)j. |
Fig. 3 Illustration of DWIR, where the model takes the current WC prediction as part of the input and predicts an update. |
Starting from k = 0, the model is reused to iterate K times, and we obtain the final prediction . We train the model with a loss function
In a model's architecture, the WCs are treated just as a different kind of point particle, so any existing model can be used here, such as using DP-MP for improved accuracy. Also, DWIR can work with either spin-saturated or spin-polarized calculations. The latter is used in the e−(aq) system where one WC represents one electron instead of a pair.
The initial guess, while not important for the final result, should not deviate too much from the true WCs, otherwise the model will have a hard time converging. For example, in water, one can initialize 4/8 (spin-saturated/spin-polarized) random WCs around each Oxygen atom during training. In e−(aq), one can initialize the excess electron's WC to be within 1 Å of the true WC during training. In a simulation, one simply uses the previous step's prediction as the initial guess for the next step.
Again, there are certain implementation details. For example, s(r) is modified to be finite at r = 0 to handle potentially overlapping particle positions. For more details, we refer to the published code.
We use a cutoff radius of 6 Å and find that one MP pass in the DP-MP model is best in achieving a good accuracy while offering a significant speed advantage over other models. We use the default network width (number of neurons in each layer) of (32,32) for the initial embedding network, (64,32,64) for the MP embedding network, and (64,64,64,1) for the fitting network. We also benchmark the newly implemented DP model (referred to as DP(JAX)) with the default embedding network width (32,32,64).‡ Apart from the implementation, it differs from the original DP34(implemented in TensorFlow, referred to as DP(TF)) in that the per-atom features in step 3 are (T(3)i, T(6)i) as in the final loop of in DP-MP.
We measure the root mean square error (RMSE) or mean absolute error (MAE) of force predictions on the validation set. We also measure the speed of the models in simulations of a system of 128 H2O molecules on a single NVIDIA A100 GPU. The results are summarized in Table 1. We also compare with other neural network force fields including SchNet,35 NequIP,11 and MACE.36§¶ We plot the accuracy-speed trade-off in Fig. 4. It can be seen that DP-MP is almost two orders of magnitude faster compared to other models with similar accuracy. This is largely due to the design of the model itself, but various implementation gains|| play an important role as well. Together, the new enhanced models (DP-MP and DP(JAX)) achieve a great balance between accuracy and speed and offer a good choice for large-scale simulations.
Model | Force RMSE | Force MAE | Cost |
---|---|---|---|
DP(TF) | 35 | 27 | 3.25 |
DP(JAX) | 23 | 18 | 1.99 |
DP-MP | 9.3 | 6.5 | 6.77 |
We will only focus on studying the localized state. One reason is that non-adiabatic effects can be present in the delocalized state, which are not captured by electronic ground state simulations. Another reason would be that ML models are agnostic to the total number of electrons. If a model were to be transferable to larger systems, it should work for both e−(aq) and normal bulk water. Upon creating a delocalized electron in a finite box, the atoms are still at a bulk water configuration, indistinguishable to the ML model, but the atomic forces become different, making the forces ill-defined if these states are to be included.
To keep track of the electron's position, we also train a DWIR model. We use the same DP-MP base architecture. The configurations used for training are also the same, with the WCs calculated from the Kohn–Sham orbitals of the DFT calculations. Since we're only interested in the excess electron here, only one WC per configuration needs to be predicted by the model, though the DWIR model can equally well keep track of all the electrons. We perform K = 4 iterations of refinement. We use a batch size of 64 and trained for 50000 batches using Adam with an exponentially decaying learning rate from 10−2 to 10−4.
The DP-MP model achieves a root mean square validation error of 12 meV Å−1 for the forces. The DWIR model achieves a root mean square validation error of 0.025 Å for the WC. The DP-MP model is then used to perform a 1 ns-long simulation** in the same NVT ensemble as the DFT simulation. To show the transferability of the model, we also perform a simulation of a larger system of 256 H2O molecules plus one e−, with the same DP-MP model. The DWIR model is used to predict the WCs after the simulation, and the WCs are then used to calculate various properties of the solvated electron.
First off, to qualitatively see that WCs are intuitively representative of the charge distribution, we take a random snapshot of the system where we calculated the (negative) electrostatic potential of the system minus the excess e−, shown in Fig. 6a. This calculation is based on an approximation of spherical Gaussian distribution of positive charge at atomic cores and negative charge at WCs of the system except that of e−, with the same inverse spread β = 0.4 Å−1 as suggested by Zhang et al.33, followed by a particle–particle particle-mesh (PPPM) calculation. This is sufficient to get the long range electrostatic potential up to dipole contributions. We plot the horizontal slice of the periodic box with the WC of e− centered at the origin. It can be seen that the WC agrees with the minimum of the quasi-spherical potential well created by the surrounding molecules.
We conduct a Voronoi analysis on the WC of e− and all oxygen atoms. The respective volume distributions are shown in Fig. 6b. The volume of e− is smaller than that occupied by a water molecule. From the average volume we deduce a radius of 1.80 Å compared to 1.92 Å for a water molecule.
In Fig. 6c, we show the radial distribution function between the WC of e− and O/H atoms. The first peak is at 1.4 Å for e−–H, and 2.4 Å for e−–O. The first minimum is at 2.3 Å for e−–H, and 3.3 Å for e−–O. We obtain the result for the 256-molecule system as well. It is slightly more structured and localized, which alludes to the importance of using a large enough box to simulate e−(aq), where a smaller box lowers the energy barrier for delocalization. But the difference between system size 128 and 256 is already tiny, indicating valid results with 128 molecules.
We compute the coordination number of hydrogen atoms within the first minimum 2.3 Å, resulting in a mean of 3.4†† and a high standard deviation of 1.1, where the distribution spans from 1 to 7 as shown in Fig. 6d. This is due to a highly volatile H-bond network in the vicinity of the electron.
By sampling 2000 configurations from the trajectory and conducting DFT calculations again, we compute the radius of gyration from the spread of the Wannierized wave function of e−. The average number is 2.16 ҇ with a standard deviation of 0.18 Å, which, compared to the radius inferred from the Voronoi volume, indicates that the electronic density extends into the first shell. It is still localized but much less localized than the Wannierized valence-band electrons in water. The radius of gyration is plotted against the Kohn–Sham band gap in Fig. 6e, showing a clear negative correlation, where a larger radius corresponds to a smaller band gap and a state of higher energy. A lack of an extended tail at the bottom-right is an indication of stable localization.
To examine the nature of the interaction between e− and surrounding water molecules, we compute the distribution of the cosine angle of H–O–e−, conditioning on H–O being covalently bonded, as well as the distance between O and e− being within 3.0 Å. This is compared with the angle of H–O–O, or H–O1–O2, where H–O1 is covalently bonded and O1 –O2 is within 3.0 Å. In Fig. 6f, the latter shows a strong peak at 0°, which corresponds to the H-bond angle, with another soft peak that stands for the other H atom covalently bonded to O1. The cosine of H–O–e− also shows the similar two peaks, which indicates that H–e− bonds are similar to H-bonds, with one hydroxyl group pointing to the electron as indicated in Fig. 4. The soft peak for water is at around −0.33, corresponding to a tetrahedral H-bond network, while the soft peak for e− is at around −0.27, much closer to the cosine angle of the water molecule itself. This indicates that the H-bond network is disturbed by the excess electron.
To understand the diffusion properties of e−(aq), additional NVE simulations are performed at the same temperature. By fitting the mean square displacement to the Einstein relation, the diffusion coefficient of e−(aq) is calculated as 0.33 ± 0.01 Å2 ps−1. As a comparison, the diffusion coefficient for bulk water molecules under the same setting is 0.24 ± 0.01 Å2 ps−1. The absolute value of the diffusion coefficient depends on various factors like the DFT functional and the system size, but the relative value indicates that the solvated electron is more mobile than water molecules. The fast diffusion is the result of the frequent entry and exit of water molecules into the shell surrounding the electron. The solvated electron acts as a H-bond acceptor but not a donor, disturbing the H-bond network in water. This is already implied by the high variance of the e−–H coordination number. To see it more, we calculate the survival time of the hydogen bonds as well as the H–e− bond, defined by a unified geometric criterion due to their resemblance: The distance between the donor(O) and acceptor(O or e−) is less than 3.3 Å, and the H–O–O/H–O–e− angle is less than 30°. The bond is deemed broken if the H atom forms a different bond according to this criterion, or a weaker criterion of 3.6 Å/60° is violated. The average survival time is calculated to be 0.81 ps for H-bonds, and 0.58 ps for H–e− bonds. The relative value directly indicates that the H–e− bond is less stable than H-bonds, consistent with the increased diffusion.
Footnotes |
† Code available at https://github.com/SparkyTruck/deepmd-jax. |
‡ The embedding network of the DP model, as well as the initial embedding network of the DP-MP model, is compressed by default. |
§ We employ the training parameters of SchNet and NequIP from Fu et al.37 and MACE from the official documentation with a small(64-0) and large(128-2) model. Since our dataset is larger than the examples in these references we reduce the number of training epochs accordingly but ensure that further training does not improve the validation error. |
¶ While there are presumably more accurate models, they tend to be even slower and are not included in the comparison. |
|| Firstly, JAX tends to be somewhat faster than other packages like PyTorch or TensorFlow with which the other models are implemented. Also, we connect the model with JAX-MD, enabling an end-to-end GPU workflow. The simulation of other models either uses the LAMMPS(DP(TF)) or ASE(SchNet, NequIP, MACE) interface. While the simulation part is not the computational bottleneck compared to the evaluation of the neural network, such an interface can still cause some overhead. In addition, we use 32-bit floating point accuracy in DP-MP and DP(JAX) by default, which we find to have no impact on the prediction accuracy compared to 64-bit. |
** In practice, the lifetime of e−(aq) may not be this long due to the reaction with other species like the hydronium ion, but our simulation aims at collecting the statistics of e−(aq) itself. |
†† This result is smaller than previous results21 because the coordination number is quite sensitive to the measurement of the minimum of the radial distribution function, where we give a slightly smaller 2.3 Å. |
‡‡ This radius is slightly smaller than previous estimates21 based on the Bloch state, but gives the same qualitative picture since the state of e− does not mix much with the valence band during Wannierization. |
This journal is © the Owner Societies 2024 |