Po-Jen
Hsu
*ab,
Kun-Lin
Ho
a,
Sheng-Hsien
Lin
ab and
Jer-Lai
Kuo
*a
aInstitute of Atomic and Molecular Sciences, Academia Sinica, Taipei, Taiwan. E-mail: clusterga@gmail.com; jlkuo@pub.iams.sinica.edu.tw
bDepartment of Applied Chemistry, National Chiao Tung University, Hsinchu, Taiwan
First published on 22nd November 2016
The potential energy surface (PES), structures and thermal properties of methanol clusters (MeOH)n with n = 8–15 were explored by replica-exchange molecular dynamics (REMD) simulations with an empirical model and refined using density functional theory (DFT) methods. For a given size, local minima structures were sampled from REMD trajectories and archived by a newly developed molecular database via a two-stage clustering algorithm (TSCA). Our TSCA utilizes both the topology of O–H⋯O hydrogen bonding networks and the similarity of the shapes to filter out duplicates. The screened molecular database contains only distinct conformers sampled from REMD and their structures are further optimized by the two DFT methods with and without dispersion correction to examine the influence of dispersion on their structures and binding energies. Inspecting different O–H⋯O networks, the binding energies of methanol clusters are highly degenerated. The degeneracy is more significant with the dispersion effect that introduces weaker but more complex C–H⋯O bonds. Based on the structures we have searched, we were able to extract general trends and these datasets can serve as a starting point for further high-level ab initio calculations to reveal the true energy landscape of methanol clusters.
The interests in methanol clusters can be traced back to the 1960s when Pauling proposed a hypothesis that methanol molecules form simple cyclic hexamers in the liquid phase due to energy stabilization.3 However, since the 1980s various computer simulations including molecular dynamics, Monte Carlo, and ab initio molecular dynamics pointed out the dominance of linear structures in liquid methanol.4–9 Moreover, X-ray diffraction experiments revealed that on average the length of the chain is about three or four methanol molecules.10 Neutron diffraction experiments, on the other hand, conclude that the hydrogen bond chains can be up to 10 molecules long and the average chain length is only about 2.7 molecules.11 More recently, X-ray emission spectroscopy and X-ray diffraction showed that liquid methanol consists of rings and chains of methanol with the size of six to eight molecules.12,13 In contrast to liquid methanol, crystal phases of methanol were known to consist simply of infinite chains, but this notion has been recently challenged by ab initio simulations that under high pressure, stable crystal phases with small-size rings (4 and 6 molecules) can have competitive free energies.14
One of the possible reasons behind the difficulty in reaching a conclusion on the analysis of the O–H⋯O hydrogen bond in methanol is that weaker interactions between the oxygen of a hydroxyl group and the hydrogen of a nearby methyl group can influence the relative stabilities of different conformations of the methanol clusters. It has been suggested that under high pressure (>4.5 GPa), the considerable increase in C–H⋯O bonds stabilizes the methanol crystal by balancing the repulsive interactions under compression.14 In gas-phase experiments, methanol clusters have been studied, but the C–H⋯O interactions can be significant only if the cluster size is large enough to allow the formation of folded and compact conformations,15 but how the C–H⋯O bonds influence the PES has not been carefully examined yet.
The main advantage in studying clusters in the gas-phase is that experimental data of clusters can be characterized by their sizes that can be controlled with a better precision. For example, size-selected infrared spectroscopy has been carried out for both neutral16–20 and protonated methanol clusters.20–23 In neutral clusters, size selectivity is more difficult to achieve. Nevertheless, the OH- and CH-stretching modes of methanol trimers and tetramers were explored by Larsen et al. and Han et al.18,19 For medium sized clusters, IR + VUV ionization coupled with time-of-flight mass spectrometry enables size-selected infrared (IR) spectra of 4 to 8 molecules as reported by Fu et al.,17 and those of up to 9 molecules have been recorded by Buck et al. using momentum transfer scattering experiments.16 Other types of clusters such as benzene–(MeOH)n24 and phenol–(MeOH)n clusters20 can be observed in up to 6 and 50 methanol molecules, respectively. In spite of the rich study in the IR experiments, mode assignment using quantum chemistry theory was limited to only small-sized clusters. It has been shown that anharmonic frequency analysis using ab initio methods is important to understand trimers and tetramers.18,19 Theoretical interpretation of larger clusters using quantum chemistry methods, however, has not been fully accomplished due to the complexity of the PES of methanol clusters and the fact that the number of sensible isomers increases exponentially with the cluster size.
To successfully interpret the experimental observables such as free energies or vibrational spectra, it is crucial to use both a reliable theoretical model and sufficiently “good” representative isomers through the PES sampling techniques. In fact, most theoretical studies so far were achieved either with a comprehensive random/biased search using empirical models or high-level ab initio calculations on a limited number of isomers. Boyd and Pires et al. worked on (MeOH)n with n = 2–12 using different levels of ab initio methods and a hybrid density functional theory (DFT) calculation.25,26 Kazachenko et al. carried out the modified minima-hopping method associated with empirical models to study the global structures of (MeOH)n for n ≤ 15 with limited calculations.15 Nowadays, exploration of the PES using the quantum chemistry method is possible. For instance, David et al. studied the PES of the methanol tetramer with the aid of DFT and random walk algorithm.27 Do et al. also combined the DFT method with a basin hopping search algorithm to probe the dispersion nature of (MeOH)n for n = 4–7.28 While these ab initio method based searching algorithms can be applied to only small-sized clusters, they have motivated us to rethink the importance of the sampling procedure and how to obtain a sufficient number of isomers using efficient screening techniques.
There is always a computational trade-off between modeling accuracy and sampling statistics. In this work, we proposed a molecular database as a universal framework to overcome this problem. Instead of directly carrying out molecular sampling using quantum chemistry models, we first performed replica-exchange molecular dynamics (REMD) simulation using an OPLS-AA empirical force field to speed up MD and local optimized isomers at different temperatures can be collected in parallel. The resulting set of isomers will be highly duplicated so we develop a sophisticated screening technique to screen out duplicates before archiving them into our database. The screening process involves pattern recognition on O–H⋯O hydrogen bond networks and determination of molecular shape similarity. After these steps, the molecular database is small enough for further verification of quantum chemistry calculations. The computationally expensive part, quantum chemistry calculations, can thus also be performed in parallel to speed up the calculations. Moreover, the hydrogen bond networks in the optimized isomers can be ranked according to their statistics, which can guide us to choose the important isomers. To demonstrate the advantage of our framework, we used the popular B3LYP/6-31+G(d,p) with and without D3 dispersion correction as suggested by Kruse et al.29 to study the cluster sizes of n = 8–15. We found many energetically almost degenerated isomers with different hydrogen bond topologies and zero-point energy (ZPE) also plays a non-negligible role. Thus the existence of many conformations and the possibilities of the anharmonic effect on ZPE are challenging the accuracy of DFT methods to conclude the precise global minima. Nevertheless, we are able to extract general trends based on the structures we have searched and these structures can serve as a starting point for further high-level ab initio calculations to reveal the true energy landscape of methanol clusters.
The initial configurations for starting MD simulation were generated randomly. For a given run of REMD, each replica goes through a total of 108 steps with a time step of 1 fs to ensure the accuracy of integration of equations of motion in the MD. The attempt to swap adjacent replicas is fixed at 2 ps. In each 100 ns run of REMD, we only use the last 4 ns of trajectory for sampling. The sampling rate is fixed at 2 ps, so we collect 2000 snapshots for each replica. In total, we collect 64000 snapshots from 32 replicas of REMD. Local geometry optimizations using the conjugate gradient method were then performed to look for the tentative global minimum (the isomer with lowest energy). The equilibrium at lower temperatures can be difficult to achieve because the statistics at lower temperature is more sensitive to the initial configuration of MD than that at higher temperature. To overcome this problem, we initiated another run of REMD simulation with the same 32 temperature points from the newly found global structure to accelerate the equilibrium.
To achieve reliable thermal properties, we repeated REMD simulations until no lower global energy was found. In general, the “real” global structures were successfully located after the third or the fourth REMD run. The local minima collected from different replicas have different characteristics. It is reasonable to expect that minima derived from high-temperature ensembles have a greater variety in their structures and low-temperature ensembles will be limited to low-energy isomers. With the frequent sampling rate at 2 ps, it is likely we will encounter the same local minima, and thus our database of 64000 isomers shall be trimmed down leaving only the distinct ones to be archived in our database by the screening technique and re-optimized again by the DFT method.
A distributed database provides a unified framework and opens many possibilities to the molecular structural analysis. Our experiments revealed that a database with ∼5000 clusters offers the best performance in our parallel computing environment. That is, we distributed the cluster sets to several databases (xyz01.db, xyz02.db, and so on) for screening and analyzing in parallel. The remaining clusters after screening in each database will be included in a temporary database (tmp.db) for final processing. Our multiprocessing mechanism on separated databases with 5000 cluster restriction ensures an efficient memory usage and prevents the time complexity of O(n2) in our screening algorithm for large n (number of clusters). Before any screening or clustering procedure, the coordinates are always sorted from the lowest energy to the highest one.
Finally, we adapt a nomenclature to describe the detailed configuration of the hydrogen bond topology as follows: the ring, linear, and tree topologies are referred to as r(m), l(m), and t(m), respectively, where m is the number of methanol molecules. The symbol “–” refers to branching on the AAD site and “_” represents separation. For instance, a single ring consisting of 14 methanol molecules is named r14. For two separated rings each with 6 and 8 methanol molecules, respectively, we refer its topology as r6_r8. For a branch of 1 methanol molecule on a ring of size of 13 molecules, we denote it as r13–t1. In TSCA, this detailed topology labeling scheme serves as a clustering index to group the clusters with the same topology. It is worth mentioning that the screening process by shape similarity will not be carried out across different topology groups.
The USR technique has been widely adopted in many materials including proteins,32–36 polymers,37 metal clusters,38 and water clusters.39–42 In the preliminary analysis of the methanol structure, we realized that there is no clear gap between similar and dissimilar configurations. Without the aforementioned topological clustering, we found that a large amount of clusters is eliminated because USR cannot distinguish different bonding patterns. After successfully preserving the distinct bonding networks by the first clustering stage, the clustered isomers will be further divided into two groups by a similarity threshold. We preserve those clusters with an index lower than the threshold since they are more distinguishable in shape. We chose 0.85 as the similarity threshold since it gives a reasonable amount of cluster sets. A more aggressive USR filtering is possible by using a lower threshold. As shown in Fig. 1, the output of TSCA guarantees representing all the possible bonding networks, and excludes the duplicated or highly similar configurations.
Fig. 3 Temperature dependence of the probability of different O–H⋯O bonding topologies of (MeOH)n (n = 8–15) obtained from REMD simulations with the OPLS-AA force field. Dominating species at low-temperature agree with the global minima found by Kazachenko et al.15 When temperature increases to 70–80 K, other topologies containing single-donor (D) or single-acceptor (A) begin to gain dominance. |
In the high-temperature region, the other types (linear and tree) of bonding topologies containing single-donor (D) or single-acceptor (A) start to gain dominance. As temperature increases, the entropic effect becomes more significant and favors this group of topologies, the structures of which are more open and flexible. It is interesting to note that the onset temperature of linear and tree topologies in the OPLS-AA model (see the red dashed black solid curves in Fig. 3) is nearly the same within the size range we have examined.
Unlike other methods' search for global minima, the REMD simulation generates isomers by thermally perturbing the system. This approach allows us to sample structures that have competitive free energies at different temperatures. The subsequent geometry optimizations generate intrinsic structures sampled at different temperatures described using the OPLS-AA model. In principle, we can analyze the intrinsic structures obtained at different temperatures, however, our main goal here is to collect sensible structures for further analysis using DFT and/or high-level ab initio calculations. Thus, we have stored all intrinsic structures that cover a wide range of phase space in our database.
The overall occurrence (counts) of the hydrogen bond topology of the intrinsic structures sampled in the REMD trajectories is shown in Fig. 4. In our database, we further divide the topology groups based on the length of the hydrogen bond, for example, within the ring–tail group of n = 8, one can have r7–t1, r6–t2, r5–t3 and so on. There are many distinct topological groups ranging from 59 (n = 8) to 579 (n = 15) (see the row of # of topologies in Table 1). So, the x-axis of the occurrence diagrams is simply an index referring to the order of the abundance of topologies. We should also emphasize that the y-axes are shown on the logarithmic scale. If drawn on the linear scale, every occurrence curve will clearly show a long-tail tendency. This general tendency allows us to discard most of the complex topologies since their occurrences are very low, namely, less than 10 times (Fig. 4). While the total numbers of distinct hydrogen bond topologies are large, there are not too many dominating topologies. The possibility to discard those “rarely sampled” topologies which are mostly ring–tail topologies helps us to target relevant topologies and to further reduce the size of the molecular database.
Fig. 4 The occurrence (counts) of O–H⋯O topologies for a given size of (MeOH)n (n = 8–15) on the logarithmic scale. The topology index includes all the possible hydrogen bond topologies (not shown) extracted from the 64000 intrinsic structures (local minima) of REMD simulation. Names of top five occurring topologies are shown explicitly. More than half of the topologies with less than 10 occurrences can be excluded to further reduce the size of the molecular database after the TSCA method (see Table 1). |
n | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
---|---|---|---|---|---|---|---|---|
# of isomers after TSCA | 732 | 1288 | 1813 | 2383 | 2848 | 3411 | 3897 | 4283 |
# of topologies | 59 | 87 | 124 | 179 | 238 | 339 | 452 | 579 |
# of topologies (>10 times) | 24 | 37 | 46 | 61 | 66 | 79 | 92 | 117 |
# of final isomers (OPLS-AA) | 469 | 839 | 1144 | 1494 | 1747 | 1978 | 2051 | 1924 |
# of finished (B3LYP) | 452 | 818 | 1116 | 1452 | 1057 | 1026 | 1213 | 1081 |
# of finished (B3LYP-D3) | 438 | 810 | 934 | 1038 | 1184 | 978 | 1187 | 1356 |
For a better visualization, the five most abundant topologies are shown explicitly in Fig. 4. For n ≤ 13, single ring topologies are most frequently found. The ring–tail topologies come next with the second and the third abundant being r(n − 1)–t1 and r(n − 2)–t2. The r_r topologies appear in the leading group as small as n = 11. In n = 13, while the global minimum is r6_r7, there also exist other r_r topologies (such as r5_r8 and r4_r9) with lower occurrence. As for n = 14 and 15, populations of double-ring topologies (r6_r8 and r7_r8) are almost as rich as those of single-ring topologies (r14 and r15). Since we have integrated structures sampled at different temperatures together, the abundance of a topology is not necessarily its energetic order. For example, in n = 13, the global minimum belongs to r6_r7, but the most abundant topology is r13. Furthermore, the analysis on the occurrence (shown in Fig. 4) is based on the structure of locally optimized conformation (that is an intrinsic structure) which is different from the instantaneous snapshots sampled at different temperatures by REMD trajectories (shown in Fig. 3).
The efficiency of our screening methods is highlighted in Table 1. It is clear that the TSCA method significantly reduces the amount of intrinsic structures from 64000 to several hundreds/thousands of isomers. These isomers are distributed in dozens/hundreds of hydrogen bond topologies, and more than half of the topologies contain less than 10 isomers. Discarding those isomers with low occurrence can further reduce the amount of isomers to 65% for n = 8 and 45% for n = 15. The quantum chemistry calculations were carried out using these isomers from higher ranked topologies to lower ones, which makes the exploration of the PES on larger clusters feasible.
To summarize, there are a few advantages in our approach: (1) Because our REMD is well equilibrated, the global minimum and its related structures will not be missed in the low-temperature ensembles. (2) Since one cannot guarantee the accuracy of the empirical models, the thermal energy in REMD assists the automatic sampling of the PES of OPLS-AA. Therefore, most (if not all) statistically sensible structures will be included in our database. (3) It is understandable that canonical ensembles with adjacent temperatures typically sample a similar region in the phase space, and integrating all intrinsic structures in one big database allows our TSCA to efficiently screen out the duplicates based on the similarity in their geometries and leave a small but well represented set for further refinements.
Fig. 5 Size dependence of the binding energy (kcal per mol per MeOH) of (MeOH)n, n = 8–15 by (a) OPLS-AA, (b) B3LYP, and (c) B3LYP-D3. As there are too many topological groups, for simplicity, only the energies of the most stable form in single ring (r), double ring (r_r) and other topologies are shown in this figure. Binding energies under OPLS-AA and B3LYP-D3 have a monotonic decreasing trend with respect to the cluster size. In B3LYP, the binding energy of the single ring does not change in this size range. More detailed data on the binding energies of other topologies/isomers can be found in the ESI† (Fig. S1–S8). |
Structures of the low-energy minima also indicate the importance of van der Waals interaction. In Fig. 6, we can find the most stable minima of different topologies of (MeOH)14 described by the three methods. In B3LYP models, the optimized structures tend to be relatively open to keep the directionality of the O–H⋯O hydrogen-bonding. In B3LYP-D3, dispersion can bend the hydrogen bond to make the structure more compact, for example, in r14 and r6_r8, the hydrogen bond rings are both more folded than their counterparts optimized with B3LYP. The similarity between the structures of OPLS-AA and that of B3LYP-D again suggests that van der Waals dispersion likely is the dominating factor. Structures of low-energy minima for n = 8 and n = 11 are compiled in the ESI† for those who are interested in a more detailed comparison.
Fig. 6 Structure of the most stable isomers of (MeOH)14 in the leading five topologies of (a) OPLS-AA, (b) B3LYP, and (c) B3LYP-D3 models. Intrinsic structures under OPLS-AA and B3LYP-D3 are generally more compact than those by B3LYP. The same observations can be found from the structures of low-energy isomers of (MeOH)8 and (MeOH)11 shown in the ESI† (Fig. S17 and S18). |
One of the important points that cannot be seen from Fig. 5 and 6 is that there are many possible conformations with very close binding energies. In Fig. 7, we show the binding energies of isomers we have collected for n = 8, n = 11 and n = 14. Due to the limit of space, we only show the energies of the conformations in the first five topological groups. A more extended list covering different topologies in n = 8 to n = 15 is compiled in the ESI† (Fig. S1–S8, ESI†). One can clearly see from Fig. 7 and the ESI† that the relative energy between the most stable isomers among the leading group is not large. For example, in B3LYP-D3, the energetic difference between the most stable and the 5th stable is less than 0.2 kcal mol−1. Furthermore, within each topological group there are many isomers having a more significant variation in their binding energies. Thus, one shall always keep in mind that the difference in the binding energy of isomers within the same topological family is not less than the difference between two topological families. Therefore, if one looks at one or two conformations within each topological group, it is possible to come to different conclusions based on the conformations one selects. Our methodology is a design to overcome such problems, because our sampling scheme utilizes thermal perturbation in REMD, and thus different statistically important conformations can be properly sampled.
Fig. 8 Influence of zero-point energy on the binding energy (kcal per mol per MeOH) of (MeOH)n, n = 8, 11, and 14 by B3LYP-D3 is shown. ZPE reduces the binding energy by 1.5 kcal per mol per MeOH which is mainly due to the redshift in the O–H as a result of the formation of the hydrogen bond. More topological groups are shown in Fig. 9 than Fig. 7. The more extended lists that we have searched, covering all sizes, can be found in Fig. S1–S8 (ESI†). |
There are a few general properties that we can draw by analyzing the whole dataset. First, the binding energy is reduced by 1.5–1.6 kcal per mol per MeOH by adding ZPE. This reduction is mainly due to the formation of a hydrogen bond. In methanol clusters, the number of hydrogen bonds per molecule is almost one. With the formation of O–H⋯O hydrogen bonds, the free O–H stretch mode shifts from about 3700 cm−1 (in monomer) to 3200 cm−1 in O–H⋯O. Thus in methanol clusters, the ZPE is reduced by ∼500 cm−1–1.4 kcal mol−1. Since the coordination number of methanol does not change much, this reduction in binding energy by ZPE is not sensitive to the change in the cluster size. From previous works on protonated water and methanol clusters,22,48,49 it is often found that the PES and the ZPE have opposite contributions to the binding energy, the former tends to favor more compact structures with a great number of hydrogen bonds. In neutral methanol clusters, since the coordination number does not change much one may anticipate a less significant effect on ZPE. In Fig. 9, we show the correlation between the binding energies with and without ZPE correction. The nice correlation confirms that ZPE plays a secondary role. But due to the small energetic difference in the PES, a difference in ZPE is still sufficient to switch the energetic order. We should point out that within such a small energetic difference, vibrational anharmonicity47 may be another factor that should be taken into consideration.
Fig. 9 Correlation between binding energy (kcal per mol per MeOH) of (MeOH)n with (x-axis) and without (y-axis) zero-point energy correction. In this figure, we use a simple grouping of the topology as the one used in Fig. 5 for n = 8 (a), 11 (b), and 14 (c). Single ring (r, red dots), double ring (r_r, blue dots) and other topologies (green dots) are shown in different colors. Linear regression analysis is presented by red lines and the corresponding linear fitted functions are referred to in the bottom-right corners. |
While only a limited selection of empirical models and DFT methods have been used in this work, because a large set of isomers have been analyzed, we are optimistic that these conclusions will still be valid if higher level ab initio calculations can be applied to examine methanol clusters in this size range.
To bridge the difference in the computational resources required by empirical models and DFT methods, we implemented a standard procedure to select energetically important and structural distinct isomers for building up our universal molecular database. We first utilize the traditional concept based on the O–H⋯O hydrogen bond topology to sort the structures of isomers. Since the relative weaker C–H⋯O bonds can also play a role in determining the structure and the stability of methanol clusters, we go beyond the idea of relying entirely on the O–H⋯O hydrogen bond topology and include both topological and spatial (similarity in shape) indices in the newly proposed TSCA algorithm. It is our hope that our clustering algorithm can become a useful tool in other molecular sampling methods. The newly developed TSCA is published along with this work as open source software50 under GNU Public License.
We use B3LYP and B3LYP-D3 with 6-31+G* to carry out extensive DFT calculations on the large set of isomers created by our TSCA algorithm both because of their computational efficiency and also the fact they are known to be over- and under-estimating methods. Furthermore, these two DFT methods are different only in the dispersion interaction which allows us to directly look into the effect of dispersion in determining the properties of methanol clusters. Within the size (n = 8 to n = 15) we have searched, there are many low-energy minima with a very small energetic difference and the exact global minimum structure is thus sensitive to the choice of method. The role of zero-point energy correction has been analyzed and we hope that the low-energy isomers we found can stimulate further investigations using more sophisticated DFT and/or high-level ab initio calculations.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/c6cp07120a |
This journal is © the Owner Societies 2017 |