Bernd
Hartke
Institute for Physical Chemistry, Kiel University, 24118 Kiel, Germany. E-mail: hartke@pctc.uni-kiel.de; Tel: +49-431-8802753
First published on 10th May 2024
For pure, neutral, isolated molecular clusters, (H2O)17 marks the transition from structures with all water molecules on the cluster surface to water self-hydration, i.e., cluster structures around one central water molecule. Getting this right with water model potentials turns out to be challenging. Even the best water potentials currently available, which reproduce collective properties very well, still deliver contradicting results for (H2O)17, when different low-energy isomers from global structure optimizations are examined. Interestingly, ab initio quantum chemistry also struggles with the only seemingly simple question if (H2O)17 is all-surface or water-centered. Hence, although the long history of water potential development may be entering its final phase, it is not quite finished yet.
In these studies, impressive agreement between experimental data and simulations employing q-AQUA(-pol) and MB-pol was shown, for many bulk-collective properties and for small clusters (H2O)n, n = 2, 3,…, 6. However, while the water hexamer has long been recognized as the smallest water cluster with a significant multitude of minimum-energy structures that differ qualitatively (prism, cage, book, etc.), it still is a very small cluster. As a general rule for atomic and molecular clusters, this automatically means that energy differences between the lowest-energy hexamer isomers are relatively large. In contrast, for larger clusters these low-lying energy differences quickly shrink towards zero.
In fact, it is well established that huge qualitative differences in global minimum-energy structures are fairly rare for (H2O)n up to n = 16, even if older water potentials are employed, including TIP4P18 or the TTMx-F series.19–21 The simple reasons for this finding are the preference of water for cubes and pentagonal prisms, and the surprisingly numerous possibilities for building clusters (H2O)n for 8 ≤ n ≤ 16 from fused cubes and pentagonal prisms.
At the same time, for n ≤ 16 there essentially is no low-energy way to avoid having all water molecules on the cluster surface. In contrast, for n = 17 it is geometrically and energetically feasible for the first time to have one water molecule in the cluster center, with all other water molecules forming a first hydration shell around it.
The above has been pointed out already 20 years ago.22 Using global cluster structure optimization, it was demonstrated that the very different water models TIP4P and TTM2-F essentially agree on global minimum-energy structures for (H2O)n up to n ≤ 16 but then strongly diverge for n = 17 and n = 21. In particular, for these two sizes TIP4P features all-surface structures while TTM2-F favors molecule-centered isomers. Incidentally, the TTM2-F structures were later confirmed as energetically lower at the MP2/aug-cc-pVTZ level.23 For (H2O)17, this result was later again confirmed24 at the CCSD(T)/aug-cc-pVTZ level – but the calculations reported below cast some doubt on that. Incidentally, this latter study found a new (H2O)16 global minimum-energy candidate at this level, comprised of two fused pentagonal prisms, which turned out to be 2–4 kJ mol−1 lower in energy than the previous best candidate consisting of three fused cubes. This again nicely illustrates the well-known preference for cubes and pentagonal prisms, and the surprising versatility of these as cluster building blocks.
As mentioned above, so far studies with q-AQUA, q-AQUA-pol and MB-pol2016/2023 either have not included clusters larger than (H2O)6 or have not examined (H2O)17 in detail.25 Even the impressively comprehensive and detailed water model comparison by Herman and Xantheas17 focused only on one (water-centered) isomer for this cluster size. Hence, it is interesting to find out how these potentials fare for (H2O)17 when used in global cluster structure optimization, i.e., when the most important low-energy isomers of this cluster size are examined, which is the topic of this contribution.
OGOLEM employs genetic/evolutionary algorithms (GA/EA) as non-deterministic global optimization method.29–32 The power of this approach has been proven in numerous studies, ranging from abstract benchmarks33 and real-life benchmarks34–36 to real-world applications.37,38 Special expertise for water clusters has been collected with various potentials, including NEMO,39 TIP4P,35,40–47 TTM2-F48,49 and TTM3-F.34,50,51
Original source codes for q-AQUA, q-AQUA-pol, MB-pol2016 and MB-pol2023 were obtained from their respective websites,52–54 compiled, tested and interfaced with OGOLEM.
For (H2O)17 and each of the q-AQUA and MB-pol potentials, 10–20 global optimization runs of 25k steps were performed, collecting not only global minimum-energy candidates but also interesting low-energy local minima. In each case, the best global minimum-energy candidate was found repeatedly, in about 50% of the runs. Similar sets of global optimizations were also done for TIP3P, TIP4P, TIP4P/200555 and TTM3-F, all natively implemented in OGOLEM, and with the AMOEBA56 and AMOEBA+57,58 water potentials that were coupled to OGOLEM via its interface to TINKER.59,60 Only a small but representative subset of all these globally optimized structures is presented here. Of course, structures found with one water potential were locally re-optimized with the other water potentials. As further complement of the spectrum of methods, also the semiempirical method GFN2-xTB61,62 and the GFN-FF63 force field derived from it were employed.
Similarly, these structures were also re-optimized locally at various density-functional theory (DFT) and ab initio theory levels of quantum chemistry. For these calculations, the Orca package64 v5.0.4 was employed. For DFT, the range-separated hybrid functional ωB97X-V65 and the double-hybrid functional B2PLYP66,67 were used, with basis sets ranging from def2-TZVP to def2-QZVPPD68 and without and with D4 dispersion correction.69,70 This was complemented by wavefunction-based explicit correlation approaches, namely SCS-MP271 and DLPNO-CCSD(T)72 with RIJCOSX,73,74 both with fixed basis sets (from triple-zeta to quintuple-zeta) and with complete basis set extrapolation (CBS). Following Kjaergaard,75 also f12 basis sets were used at the CCSD(T) level. To make contact to the large body of work on the water hexamer in the literature, results for these levels of theory for (H2O)6 are compiled in the ESI.†
Obviously, the levels of ab initio methods could be refined even further. As pointed out by Klopper et al. for the water dimer,76 core–valence correlation leads to further small corrections. As indicated by Lane,77 employing non-perturbative triples in CC-theory or including quadruples makes sense only if scalar-relativistic and diagonal-non-Born–Oppenheimer corrections are also calculated, which are not easily accessible in practice for systems as large as (H2O)17. However, these levels of refinement were not used in the ab initio data q-AQUA and MB-pol were trained on. Additionally, structure re-optimizations at the CCSD(T)-level arguably could be more important than all these corrections. For reasons of prohibitive computational costs, no local re-optimizations were done here at the CCSD(T) level; instead, only single-point calculations were done, using the structures locally re-optimized at the B2PLYP-D4/def2-TZVPPD level.
In summary, global cluster structure optimizations were performed with all water potentials, the pairwise-additive fixed-point-charge ones (TIP3P, TIP4P, TIP4P2005), the polarizable ones (TTM3-F, AMOEBA, AMOEBA+) and the machine-learning many-body models (q-AQUA and MB-pol). From the collected results of these global optimizations, a small but representative subset of structures was selected. These structures were then locally re-optimized with all water potentials and at all levels of DFT and ab initio theory, with the single exception of CCSD(T) (in several variants). For these CCSD(T) approaches, only single-point calculations were performed, for the structures locally optimized at the B2PLYP-D4/def2-TZVPPD level.
In all cases for which local optimizations were done, the cluster structures remained qualitatively the same, no conversion from one structure to any other one was observed. Nevertheless, as it is standard, accidental convergence of local optimizations to saddle points was excluded by frequency calculations. For all structures shown here, all frequencies were real (and positive or zero), not a single imaginary frequency was found. Hence, all these structures are true minima.
I am deliberately not including any zero-point vibrational energy (ZPVE) correction, since the focus here is on performance for bare potential energies. ZPVE can be included with any efficiency only in the harmonic approximation, which for differing hydrogen-bond patterns likely introduces errors much larger than the small potential energy differences shown in the results section below. In an alternative approach already in 1996, Liu et al.78 have used a rigid-monomer water potential in fully anharmonic diffusion quantum Monte-Carlo calculations for the remaining nuclear degrees of freedom, showing for the hexamer that ZPVE corrections on the order of 5000 cm−1 caused a reversal of the energy ordering of prism versus cage, from the prism being favored by 213 cm−1 without ZPVE to the cage being favored by 62 cm−1 with ZPVE. This hexamer finding was confirmed by the Bowman group in their q-AQUA paper,5 using a similar approach. With inclusion of intramolecular degrees of freedom and for the larger clusters under study here, this situation would be exacerbated even more towards small differences between large numbers, which is never favorable for accuracy. Finally, for water potentials fitted to experimental data one could argue that these indirectly contain ZPVE (which makes their comparisons below somewhat skewed). However, both q-AQUA-pol and MB-pol were fitted to ZPVE-free ab initio data, such that their comparisons on a ZPVE-free level should be more meaningful.
Many quantum-chemical cluster studies use counterpoise corrections of basis set superposition errors (BSSE). Despite the long history of this concept (and its correction), arguments persists to this day79 that what is usually seen as BSSE is in large parts actually basis set imbalance, from which it follows that counterpoise corrections may or may not work. This nicely ties in with typical findings in the applied quantum-chemistry literature of the past 30 years: counterpoise corrections are sometimes found to be a necessary improvement, while at other times results without counterpoise correction are significantly better. In the current context, the ab initio data used for q-AQUA and MB-pol fitting do employ BSSE corrections while for example the study by Xantheas et al.24 does not. Here, I am not employing BSSE corrections, for three reasons: (1) whether BSSE is based on basis set incompleteness artifacts or basis set imbalance, it has to disappear in the CBS limit, and be very small for large basis sets; both situations are targeted here. (2) As pointed out by Klopper et al.,76 BSSE corrections should be complemented by structural fragment/monomer relaxations, leading to further and possibly larger corrections. (3) In this study, I am not trying to establish dissociation energies into monomers but instead compare different fully hydrogen-bonded cluster isomers of the same size, with not identical but very similar water–water neighborhoods. For such direct isomer comparisons, both the usual “ghost basis” prescription for the electronic part and the fragment/monomer relaxation becomes awkward or even impossible. Furthermore, artificially referring each isomer structure to a fully dissociated collection of water monomers, just for counterpoise and relaxation purposes, may actually induce additional errors, by having monomer sets with different ghost basis sets and geometries, and by compounding two different BSSE and geometry corrections (isomer1–monomers and isomer2–monomers), which may or may not lead to cancellation of BSSE/geometry correction errors.
As a practical note of caution, standard convergence thresholds and standard grids of Orca (and other quantum-chemistry packages) were not sufficient to provide a clear-cut energy ordering at the DFT/ab initio level. This was detected accidentally, when local geometry optimizations from slightly different starting structures at the double-hybrid DFT level ended up with an energy difference of 0.75 kJ mol−1, although they should have been exactly identical and indeed seemed visually identical in molecular viewers (later re-examination showed small deviations of 0.01 Å in a few oxygen–oxygen distances and of 1° in a few angles and dihedrals between water molecules). Requesting tighter convergence thresholds and denser grids (in Orca: verytightSCF, verytightopt, defgrid3 and Stol 10−7) resulted in disappearance of the very tiny structure deviations just mentioned and in lowering the energy differences into a range of 0.0004–0.08 kJ mol−1, which is small enough for the comparison scale presented in the Results section below. Note that this is an artifact of the DFT/ab initio energy scale (zero of energy at dissociation into isolated electrons and nuclei). Since the model potentials have their zero of energy at dissociation into isolated water molecules, their standard (and fairly loose) convergence thresholds suffice to produce an energy error bar on the order of <0.001 kJ mol−1, well below the accuracy of the present comparison.
As a final side note, the CCSD(T) calculations revealed that the T1 diagnostic for all structures is in the range of 0.0086–0.0096. This is well below the threshold of 0.02 for multireference character, justifying the single-reference quantum-chemical treatments used here, as to be expected for these systems.
Incidentally, these essentially are the same four structural types as shown in Fig. 2 of ref. 24, although there the optimal hydrogen-bond patterns were not present in all cases. Given the small energy differences between these four structural types, different H-bond patterns may easily change their energy ordering. To illustrate this, while still keeping the number of structures presented sufficiently small to avoid confusion, a few different H-bond patterns and three actual but subtle changes of the water molecule frameworks will be added in the following. Specifically, four different H-bond patterns in the p-class are shown (p1 through p4), one minor but important structural change in the t-class (t1 and t2), three H-bond variations and one structural change in the w-class (w1 through w4), and one different water attachment location in the c-class (c1 and c2). All global minimum-energy candidates found in the global structure optimization runs are contained in this set. Atomic coordinates (xyz files) and depictions of all these structures can be found in the ESI.†
Fig. 2 summarizes the findings for the top-level ab initio/DFT approaches used here. For clarity, short abbreviations for the actual levels of theory are used: “e34” stands for CBS extrapolation of DLPNO-CCSD(T), using the Orca-implemented CBS extrapolation formulae and its parameters, employing the def2-TZVPP and def2-QZVPP bases. “ccsdt/f12” designates DLPNO-CCSD(T) with the cc-pVQZ-F12 basis set, with an augmented quintuple-zeta auxiliary basis set for the correlation calculation. “ccsdt/qz” is the isolated DLPNO-CCSD(T)/def2-QZVPP result of the above “e34” calculation. Note that the above three theory levels were performed as single-point calculations, for the structures locally optimized at the “b2plyp” level mentioned just below. The following three levels were all accompanied with local structure optimizations. “mp2” is short for SCS-MP2/def2-TZVPPD. “b2plyp” designates the double-hybrid B2PLYP-D4/def2-TZVPPD level. “wb97” stands for ωB97M-D4/def2-TZVPPD. RIJCOSX was used in all calculations. Note that for each individual level of theory, its best structure was chosen as “zero” for the comparison of relative energies. Tabulated numerical values for relative and absolute energies are provided in the ESI.†
![]() | ||
Fig. 2 Ab initio/DFT results for the full set of 12 structures, representing the four structural types shown in Fig. 1. Abbreviations for theory levels are explained in the text. |
To briefly summarize the contents of Fig. 2: either p1 or w1 are the best structures, in all cases. We have e34, ccsdt/qz and b2plyp favoring p1 over w1 (or t1), and ccsdt/f12, mp2 and wb97 favoring w1 over p1 (or w2). Note that ccsdt/f12 has w1 and p1 only 0.23 kJ mol−1 apart. wb97 has them 2.3 kJ mol−1 apart (with w2 in-between), but this clearly is the lowest ab initio/DFT level. Putting it somewhat differently, the best four structures are always taken from p1, p2, t1, w1, w2, and these span only 1–2 kJ mol−1. As an aside, c1 is never among the first four entries (and c2 is worse still). Interestingly, all this means that these somewhat high-level but still mostly standard ab initio or DFT calculations can NOT decide if all-surface or water-centered is better for (H2O)17. Not even the CCSD(T) calculations can do that, since it is not necessarily obvious if e34 (predicting the all-surface structure p1) or ccsdt/f12 (favoring water-centered w1) should be considered as more reliable.
Obviously, the applied quantum-chemistry literature is full of examples drilling down such difficult cases to a final decision, by applying increasing levels of further sophistication. However, this is not my aim here. For the present purposes, it is sufficient to show that the global minimum-energy structure for (H2O)17 in general is not immediately clear with limited-effort ab initio/DFT approaches, and neither is the question all-surface versus water-centered. Whatever the final outcome may be, this clearly indicates that (H2O)17 is very close to the structural transition from all-surface to water-centered.
While the question all-surface versus water-centered is not really decidable from the data in Fig. 2, results at all theory levels agree upon energy ordering within each of the four groups (with the single exception of w3–w4). This will change in the results reported below.
In the following, results from the various water model potentials will be compared to the above ab initio/DFT results. For this, I am presenting similar plots as in Fig. 2, in which – for visual clarity – the ab initio/DFT results are not repeated individually but instead indiscriminately summarized by the overall areas their results span in this figure, irrespective of the individual levels of theory. Again, tabulated numerical values for all water model potentials can be found in the ESI.† The ESI,† also provides tabulated numerical comparison metrics between the data shown in Fig. 2–6.
The MB-pol2016 and MB-pol2023 results are shown in Fig. 3, in direct comparison to the ab initio/DFT results of Fig. 2, summarized by the shaded areas. As before, the zero of relative energies is given by the lowest energy for each level of theory or each model potential, respectively. The global minimum candidate for MB-pol2030 is p1, which can be interpreted as being in agreement with the quantum-chemistry results. MB-pol2016, however, features c1 as lowest-energy structure, in clear contrast to all levels of quantum-chemistry theory. This seems to be related to both MB-pol versions assigning energies to the c-class that are too low. Both MB-pol versions produce energies that are systematically too high for the class of water-centered structures, in contrast to the quantum-chemistry results that have all-surface and water-centered structures in very close competition. Furthermore, while the energy-ordering trends within all four classes are qualitatively correct, this fails for p4 for both MB-pol versions, and MB-pol2023 has w4 at an energy that is clearly too high. Nevertheless, compared to results shown below, the MB-pol results agree rather well with the quantum-chemistry results.
![]() | ||
Fig. 3 Relative potential energies for MB-pol2016 and MB-pol2023 for the full set of 12 structures, in direct comparison to the quantum-chemistry energies of Fig. 2 (shaded areas). Note the slightly changed span of the ordinate. |
The results of q-AQUA and q-AQUA-pol for the twelve structures are displayed in Fig. 4. Water-centered w1 and all-surface t1 are the global minimum candidates for q-AQUA and q-AQUA-pol, respectively, mirroring the quantum-chemistry results pretty well. However, in contrast to the underestimation by MB-pol, the c-class has markedly too high energies here, overshooting the energy range of Fig. 2. Similarly, t2 is too high in q-AQUA, and both model potential variants significantly disadvantage w4. While q-AQUA-pol overall stays closer to the quantum-chemistry results, essentially on-par with the MB-pol2023 deviations, it does change the energy ordering in the p- and w-classes.
![]() | ||
Fig. 4 Relative potential energies for q-AQUA and q-AQUA-pol for the full set of 12 structures, in direct comparison to the quantum-chemistry energies of Fig. 2 (shaded areas). Note the markedly extended span of the ordinate. |
Fig. 5 compares the quantum-chemistry data to TTM3-F, a traditional but more recent and refined water model potential with smeared charges and polarizabilities, to the recent version AMOEBA+57,58 of the polarizable AMOEBA water potential,56 and to the old veteran TIP4P water model. Interestingly, both TTM3-F and AMOEBA+ agree in having the water-centered w1 as the global minimum candidate, but grossly disadvantage the whole p-class (all-surface), to the extent that the energy spans of the p- and w-classes (almost) do not even overlap, in stark contrast to the quantum-chemistry results. Compared to that, and regarding its intrinsic simplicity, TIP4P does surprisingly well. Another interesting point is that all three approaches invert the energy ordering in the t-class, and only AMOEBA+ gets the energy ordering in the c-class right. Further comparisons to TIP4P/2005 and AMOEBA2014 are shown in the ESI,† because the differences to TIP4P and AMOEBA+, respectively, are fairly small.
![]() | ||
Fig. 5 Relative potential energies for TTM3-F, AMOEBA+ and TIP4P for the full set of 12 structures, in direct comparison to the quantum-chemistry energies of Fig. 2 (shaded areas). Note the somewhat extended span of the ordinate. |
Finally, to put all this into proper perspective, Fig. 6 compares the quantum-chemistry data to the results of the water model TIP3P, of the semiempirics GFN2-xTB, and of the general force field GFN-FF. Although TIP3P still is heavily used in biochemical simulations, and despite getting w1 right as global minimum candidate, it performs significantly worse here than TIP4P (cf.Fig. 5). This mirrors the well-known bad behavior of TIP3P in collective properties, e.g., the phase diagram.80–82 While one could argue that the phase diagram is not highly relevant to biochemical simulations at room temperature and standard pressure, energy-ordering of nearest-neighbor water molecules clearly is immediately relevant. I find it interesting that GFN-FF performs only somewhat worse than TIP3P, and that GFN2-xTB performs on par with the approaches shown in Fig. 5, despite the whole set of GFN methods not being designed for accuracy of relative energies.
![]() | ||
Fig. 6 Relative potential energies for TIP3P, GFN2-xTB and GFN-FF for the full set of 12 structures, in direct comparison to the quantum-chemistry energies of Fig. 2 (shaded areas). Note the strongly extended span of the ordinate. |
On the other hand, when disregarding the quantum-chemistry data areas in Fig. 6 and accepting that the model-potential energy differences are much larger than the quantum-chemistry energy differences (a frequent finding for molecular systems, nicely traced back to a lack of explicit many-body terms in pairwise additive potentials in ref. 17), the overall picture is not too bad, regarding that the p- and w-class are roughly isoenergetic. In this sense, all those biochemical simulations may not be so bad after all.
To some extent, one may question the point of this exercise, given that the quantum-chemistry data, at the level used here, cannot pin down the “actual” global minimum candidate for (H2O)17. Why should water model potentials be required to succeed with that? Indeed, this is not the actual lesson to be learned here. Rather, the lack of clarity in the “naked” ab initio results (even before ZPVE estimations blur the picture further) indicates that further big-data machine-learning efforts are warranted only after greater efforts have been made to make the ab initio reference data even more reliable. If this has been achieved, I hope that patterns of agreements and disagreements similar to those pointed out here in Fig. 3 and 4 may contribute to the next (and possibly final) step of water potential development.
Footnote |
† Electronic supplementary information (ESI) available: Cluster structures (figures and xyz-files) and tabulated energies. See DOI: https://doi.org/10.1039/d4cp00816b |
This journal is © the Owner Societies 2024 |