Chen
Qu
*a,
Qi
Yu
b,
Riccardo
Conte
c,
Paul L.
Houston
de,
Apurba
Nandi
f and
Joel M.
Bomwan
*f
aIndependent Researcher, Toronto, Ontario M9B 0E3, Canada. E-mail: szquchen@gmail.com
bDepartment of Chemistry, Yale University, New Haven, Connecticut 06520, USA
cDipartimento di Chimica, Università degli Studi di Milano, via Golgi 19, 20133 Milano, Italy
dDepartment of Chemistry and Chemical Biology, Cornell University, Ithaca, New York 14853, USA
eDepartment of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia 30332, USA
fDepartment of Chemistry, Cherry L. Emerson Center for Scientific Computation, Emory University, Atlanta, Georgia 30322, USA. E-mail: jmbowma@emory.edu
First published on 11th August 2022
Δ-Machine learning (Δ-ML) has been shown to effectively and efficiently bring a low-level ML potential energy surface to CCSD(T) quality. Here we propose extending this approach to general force fields, which implicitly or explicitly contain many-body effects. After describing this general approach, we illustrate it for the MB-pol water potential which contains CCSD(T) 2-body and 3-body interactions but relies on the TTM4-F 4-body and higher body interactions. The 4-body MB-pol (TTM4-F) interaction fails at very short range and for the water hexamer errors up to 0.84 kcal mol−1 are seen for some isomers, owing mainly to 4-body errors. We apply Δ-ML for the 4-body interaction, using a recent dataset of CCSD(T) 4-body energies that we used to develop a new water potential, q-AQUA. This 4-body correction is shown to improve the accuracy of the MB-pol potential for the relative energies of 8 isomers of the water hexamer as well as the harmonic frequencies. The new potential is robust in the very short range and so should be reliable for simulations at high pressure and/or high temperature.
The other approach is Δ-machine learning. In this approach a correction is made to a property obtained using an efficient, low-level ab initio theory.5–8 We recently proposed and tested a Δ-ML approach, that uses a small number of CCSD(T) energies, to correct a low-level PES based on DFT electronic energies and gradients.9,10 The equation for this approach is simply
VLL→CC = VLL + ΔVCC–LL, | (1) |
Here we propose to extend this Δ-ML approach from molecular PESs to general, non-reactive force fields that are explicitly or implicitly many-body. There are many examples of such force fields that determine the total energy of N monomers. For example, consider force fields for water. (For a recent review see ref. 11.) For simplicity, we denote these by “MB-FF”. Suppose our goal is to bring this force field to the “gold-standard” CCSD(T) level of theory. Clearly this cannot be done by simply applying the above equation for an arbitrary number of monomers owing to the prohibitively exponential computational cost of CCSD(T) calculations with respect to the number of monomers. Instead we propose a Δ-ML force-field for N monomers given by the sum of many-body corrections, namely
![]() | (2) |
We have truncated explicit correction terms at the 4-b level with force-fields for water in mind. This is because it has been established by high-level calculations that 4-b, while small, are needed to obtain nearly 100 percent of the electronic dissociation energies of water clusters up to the 21-mer.12 This has also been shown previously for some isomers of the water hexamer13,14 and we show this again explicitly here. And crucially, we have the CCSD(T) electronic energies needed for the correction up to 4-b, and these datasets were recently used to develop the pure many-body water potential q-AQUA.15,16
![]() | (3) |
In this work, the focus is on the 4-b correction to the MB-pol force field.19,20 In MB-pol the 2-b and 3-b are already at CCSD(T) level, but the 4-b interaction is essentially given by the TTM4-F potential,21 which is a sophisticated MB-FF for water. Errors between 0.1 and 0.84 kcal mol−1 for the these 4-b interactions for the hexamer isomers against direct CCSD(T) calculations, were reported in 2015.22 These are fairly large fractions of the 4-b energy itself. Stimulated by recent assessments of the importance of the 4-b interaction and inaccuracy of the MB-pol 4-b, we report a correction 4-b PES, denoted ΔV4-b, that is aimed directly at extending the MB-pol potential to the CCSD(T) 4-b level. The correction is a PIP fit to the energy difference between the CCSD(T) 4-b interactions and TTM4-F 4-b interactions.
In the next section we present the details of the 4-b correction PES, followed by several tests that indicate the effectiveness of this correction PES.
First, it is desirable not to include polynomials that do not have the correct limiting behavior as one or more monomers are removed from the others to a large distance. In the 4-body case, we need to consider the removal of each monomer from the other three and the removal of each possible dimer from the other one. In all of these cases, the 4-body interaction energy must vanish. The process of identifying PIPs that do not have the correct limiting behavior is what we call purification.24,25 To identify the PIPs with incorrect limiting behavior, the relevant distances are augmented by 100 Å, and we accept the polynomial as having the correct behavior if its Morse value is below 10−6. We cannot, however, immediately eliminate these polynomials because there may be other polynomials that, for example, are composed of products between one with a correct limit and one with an incorrect limit. At first, we simply rename the ones with an incorrect limit. After all the polynomials have been evaluated, we examine the definitions of all those with the correct limits and determine which of the monomials and which of the renamed polynomials with incorrect limits are required to calculate them. Finally, we remove those polynomials that are not required and renumber those that remain, keeping the order of calculation to ensure that no partial calculation that contributes to any polynomial needs to be performed twice. We then have a set of polynomials that all have the correct limiting behavior and that can be calculated efficiently.26
The second issue that we need to consider is how to maintain permutational symmetry, not only in each monomer, but when monomers as a whole are interchanged with one another. This latter exchange is not taken into account by the MSA software, so the polynomials that we create by the previously described purification will not, in general, have permutational symmetry with respect to exchange of identical monomers. A common method for dealing with this issue is to augment the dataset by adding all relevant permutations of the Cartesian coordinates and assigning them the same energy, thus requiring a set of n! geometries for each energy, where n is the number of monomers (4, in this case). A better method is to identify groups of polynomials that have permutational symmetry with respect to monomer exchange and then form “superpolynomials” that are the sum of the polynomial members of each group. We identify the permutationally invariant groups of polynomials by taking a single set of n! permutationally related geometries and calculating the value of each polynomial. While the values of individual polynomials vary from permutation to permutation, the groups of polynomials, taken together for each permutation, will have the same group of values. For each permutation, one can form pairs of the polynomial identities and their values, and then sort the pairs by their values. Looking at all pairs that have the same value component in all permutations gives the identities of the polynomials, some of which may be repeated, that make up a permutationally invariant group. In general, there will be as many groups as there were original polynomials. These groups, each with n! (not necessarily unique) polynomial contributions, are then summed to form “superpolynomials” having permutational symmetry with respect to exchange of identical molecules. Having formed these superpolynomials, there is no need for augmentation of the dataset with permutationally related geometries.
We used basis sets of different sizes, with 200, 500, and 1000 “superpolynomials”. More details of the bases are given in Table 1. As seen, although fitting with more polynomials can reduce the fitting error, the computational cost is roughly proportional to the sum of the number of monomials and polynomials. The results presented in this paper are based on the basis with 200 “superpolynomials”, as this achieves reasonably good accuracy with smaller cost.
PIP200 | PIP500 | PIP1000 | |
---|---|---|---|
Number of m | 1438 | 3442 | 8610 |
Number of q | 5490 | 12![]() |
25![]() |
Number of p | 200 | 500 | 1000 |
Fitting RMSE | 6.7 | 4.0 | 2.5 |
Timing | 1.1 | 2.7 | 6.0 |
The final energy is written as
![]() | (4) |
![]() | (5) |
![]() | ||
Fig. 2 4-b energies from indicated sources with a single monomer separating from the tetramer. ROO is the distance between the O atoms on the two monomers on the axis inferred from the arrow. |
Fig. 3 shows in the top panels the correlation plots between the TTM4-F 4-b and CCSD(T)-F12 4-b energies (panel a), and between ETTM4-F4-b + ΔV4-b and CCSD(T)-F12 4-b energies (panel b). The bottom panels plot, as a function of the maximum ROO distance of the tetramer, the difference between TTM4-F and CCSD(T) energies (panel c) and the difference between the corrected TTM4-F and CCSD(T) energies (panel d). It is visually clear that the correction provides both a better correlation and a smaller error with respect to the CCSD(T)-F12 4-b energies. Note that in addition to large errors in the short range, the TTM4-F 4-b energies also have significant errors (>50 cm−1) even when the ROO reaches 6.5 Å, as panel c shows.
We now consider the binding energies of the eight isomers of the water hexamer. The binding energies for this particular water cluster have become a “benchmark” test for both electronic structure methods as well as numerous water potentials. For a recent example see ref. 29. That the hexamer has taken this central role in both theory13,28,30,31 and experiment32,33 is due to the fact that it is the smallest cluster with a non-planar configuration (for the O atoms). Thus, it has been termed the smallest drop of water/ice. In addition for the present purposes, this cluster has a significant number (15) of 4-b interactions. The absolute binding energies of these clusters from the benchmark electronic structure calculation (CCSD(T)/CBS values taken from ref. 28) and various PESs are listed in Table 2, and comparisons are shown in Fig. 4. As seen, with the 4-b correction, the binding energies are in better agreement with the benchmark values, especially for bag and cyclic isomers.
CCSD(T)/CBS | MB-pol | MB-pol + ΔV4-b | |
---|---|---|---|
Prism | 45.92 | 45.94 | 45.92 |
Cage | 45.67 | 45.69 | 45.59 |
Book-1 | 45.20 | 44.89 | 44.93 |
Book-2 | 44.90 | 44.68 | 44.70 |
Bag | 44.30 | 44.03 | 44.19 |
Chair | 44.12 | 43.66 | 43.82 |
Boat-1 | 43.13 | 42.80 | 42.94 |
Boat-2 | 43.07 | 42.82 | 42.96 |
Table 3 shows for four of the hexamer isomers the mean absolute error (MAE) in the harmonic frequencies for both uncorrected and corrected MB-pol PES. For prism and cage, the 4-b corrected version is essentially as accurate as the original MB-pol, while for book and cyclic ring, the 4-b correction clearly improves the frequencies.
Prism | Cage | Book-1 | Cyclic ring | |||||
---|---|---|---|---|---|---|---|---|
MB-pol | +ΔV4-b | MB-pol | +ΔV4-b | MB-pol | +ΔV4-b | MB-pol | +ΔV4-b | |
MAE | 7.8 | 8.4 | 8.9 | 9.4 | 12.6 | 10.6 | 16.5 | 11.7 |
Next, we comment briefly on the timing requirements for the TTM4-F + ΔV4-b potential. As shown in the last line of Table 1, the additional time for calculating the correction for the results presented in this paper (PIP200) is only about 1 second for 100000 energy evaluations and approximately 3 seconds if the energy and associated gradients are evaluated at the same time. Table 4 shows the overall timing to evaluate the energy of 64, 128, and 256 monomers, with different cutoffs of the 4-b correction. In this table, tMBX is the time to evaluate all the terms in the MB-pol, including the TTM4-F and MB-pol 2-b and 3-b, with the latest MBX software,35 while tΔV4-b is the time to evaluate our 4-b correction term, which is the extra cost when the ΔV4-b is added to MB-pol. All timings are evaluated using a single Intel i7-8750H core. It can be seen that the extra cost of the 4-b correction is in general a small fraction of the cost of MB-pol, and the cutoff distance can be tuned to achieve a balance between the cost and accuracy.
N mono | t MBX | Cutoff = 6.0 Å | Cutoff = 6.5 Å | Cutoff = 7.0 Å | |||
---|---|---|---|---|---|---|---|
N tetra | t ΔV4-b | N tetra | t ΔV4-b | N tetra | t ΔV4-b | ||
64 | 0.057 | 1379 | 0.0085 | 2728 | 0.016 | 5639 | 0.032 |
128 | 0.21 | 5823 | 0.034 | 12![]() |
0.072 | 24![]() |
0.14 |
256 | 0.68 | 13![]() |
0.080 | 28![]() |
0.17 | 58![]() |
0.34 |
We have made comparisons between the TTM4-F 4-b potential, the TTM4-F 4-b with ΔV4-b correction, and the CCSD(T) results. A remaining question that should be addressed is how the correction potential performs in comparison with the previously reported full 4-b potential15 as recently improved.16 Of course, many larger studies may already be based on the very successful MB-pol potential. For these, improvement by ΔV4-b might be the easiest upgrade, with some extra computational cost, depending on the choice of the 4-b cutoff. But for those who are interested only in the 4-b potential, we suggest previously reported 4-b PES since it is much faster than TTM4-F 4-b + ΔV4-b, and is slightly more accurate as well.
Finally, we note that many potentials or components of potentials can be corrected by this method, which has already been shown to be accurate and efficient for CH4, H3O+, N-methyl acetamide,9 and for acetylacetone,10 with more applications in process. In the current study, we have shown substantial improvement of the 4-b potential for TTM4-F, but one might have reasonable hope that this Δ-ML method is a general approach that could provide substantial improvements to other potentials at relatively small cost. In this case, the corrections would begin with the 2-b ones and could go up to 4-b. Our recent CCSD(T) 2-b, 3-b, and 4-b datasets are available,36 and so the corrections, ΔVn-b, simply require a water force field. Some examples are the well-known potentials AMOEBA,37 and TTM2.1 (ref. 38) or TTM3.39 These are polarizable potentials, however, with significant differences. Another force field that might be interesting to “correct” is MB-UCB.40 Since this potential relies heavily on DFT calculation, using the ωB97X-V/def2-QZVPPD functional, it appears that the correction to MB-UCB would be analogous to the correction of DFT to CCSD(T) PESs that we considered originally in our Δ-ML method.
That the correction potential is a significant improvement over the TTM4-F potential can be seen (a) by comparing cuts of the potentials for TTM4-F 4-b, and TTM4-F 4-b + ΔV4-b along with the CCSD(T)-F12 values; (b) by comparing the correlation between these potentials with the CCSD(T) values and the errors as a function of ROO; (c) by comparing results for the binding energies of the water hexamer isomers; and (d) comparing the MAEs in harmonic frequencies for four of the isomers. Numerous applications of this 4-b corrected MB-pol potential can and hopefully will be done. From the present analysis, it is clear that the correction is not large; however, it does improve the accuracy of this excellent potential.
The methods described above offer three important ideas. First, we have described a method for ensuring that the potentials go to zero when appropriate distances get large. Second, we have also described a method for maintaining permutational symmetry not only in each monomer but when monomers as a whole are interchanged with one another. Finally, the ΔV method itself allows large improvements for small amount of effort, and this approach appears to be general and could be applied for other water force fields and similar types of force fields for other liquids and materials.
This journal is © The Royal Society of Chemistry 2022 |