Marcello
Sega
*a and
György
Hantal
ab
aFaculty of Physics, University of Vienna, Boltzmangasse 5, 1090 Vienna, Austria. E-mail: marcello.sega@univie.ac.at
bDepartment of Chemistry, Eszterházy Károly University, Leányka utca 6, H-3300 Eger, Hungary
First published on 27th June 2017
Partially miscible solutions can represent a challenge from the computer simulation standpoint, especially if the mutual solubility of the components is so large that their concentrations do not change much from one phase to another. In this case, identifying which molecules belong to which phase becomes a complicated task. Here, we propose a density-based clustering approach with self-tuning capabilities and apply it to the case of the mixture of an ionic liquid with benzene. The almost linear scaling of the algorithm makes it suitable for the analysis of long Molecular Dynamics or Monte Carlo trajectories.
Determining the interface between two phases in an atomistic computer simulation, however, often requires a priori knowledge of which molecules belong to which phase. Far enough from critical points (be that the liquid–vapour critical point for single component systems, or the critical mixing point in bicomponent systems), the problem of distinguishing the two phases is less severe: simulating water/vapour coexistence at a temperature of 300 K in a simulation box with an edge of about 10 nm, one can hardly find a single molecule in the vapour phase. If the vapour phase is composed of few molecules, a possible way to separate the two phases consists of defining clusters of molecules that are within a given cutoff distance from each other. If the cutoff distance corresponds to the location of the first minimum of the liquid-phase radial distribution function, the liquid phase can be reasonably defined as the largest cluster in the system.8
This approach fails if the density of the two phases becomes too similar, as this clustering approach will identify a single, large, connected cluster spanning the complete simulation box. Another, more robust approach consists of considering the liquid phase as the largest cluster of molecules, which have at least a given number of neighbours.7,9 Also this criterion has its drawbacks, as we will show, as it does not consider points with smaller local density, which are still reachable, within the cutoff distance, from the denser regions.
The equivalent situation for bicomponent systems is that of partial miscibility, where the two components A and B can give rise to an A-rich phase ΦA and a B-rich phase ΦB, where the respective concentrations are cA(ΦA) > cA(ΦB) and cB(ΦA) < cB(ΦB). The solubility of A in B does not need to be the same as that of B in A, which can lead to extreme situations in which one phase is a single component one (say, cA(ΦB) = 0), while in the other phase the two components have similar concentrations, cA(ΦA) ≃ cB(ΦA).
This condition is exemplified by the case of the ionic liquid 1-butyl-3-methylimidazolium bis(trifluoromethanesulfonyl)imide (BMIM NTf2) in combination with the simplest aromatic solvent, benzene. The concentration of BMIM NTf2 in the benzene-rich phase is negligible, while in the ionic liquid-rich phase, benzene is present in an extremely high concentration that reaches a molar ratio cbenzene/ctotal of 0.82.10 Previously, we investigated the structure of fluid interfaces between ionic liquids and other fluids with little or moderate mutual miscibility.11 In particular, we were interested in how the intrinsic interface structures of two of the most popular ionic liquid compounds, BMIM NTf2 and 1-butyl-3-methylimidazolium hexafluorophosphate (BMIM PF6), change upon varying the polarity of the opposite fluid phase. The interfacial structure of the mixture of BMIM NTf2 and benzene, however, cannot be studied using the approaches used so far, because of the peculiar mixing properties just described.
This is evident in Fig. 1, where we report a simulation snapshot of the mixture of BMIM NTf2 and benzene at room temperature. Benzene molecules are represented using a contour joining the six carbon atoms, while atoms of BMIM NTf2 are represented with a space-filling model using blue spheres (Fig. 1, left panel). The ionic liquid is clearly separating into two phases, one of which does not show the presence of any BMIM+ or NTf2− ions. Benzene, on the other hand, is present throughout the simulation box, as one can appreciate by removing the ionic liquid molecules (Fig. 1, right panel), forming a percolating system that cannot easily be separated into two phases using the simple clustering criterion mentioned before.
However, the peculiar properties of this mixture make it a perfect candidate for testing new algorithms that work closer to critical points. This is because the well separated BMIM NTf2 phase can serve as a clear reference for the other component. One can thus test any algorithm that aims to separate the high benzene concentration phase from the low benzene concentration one by checking that the former is complementary to the ionic liquid rich region.
It is clear that the condition Nc = 1 yields the simple clustering algorithm mentioned in the introduction, which is therefore a special case of DBSCAN. A clustering algorithm which is limited to core points, therefore not including density-reachable ones, has been used by Chacón and Tarazona to separate small vapour clusters from the liquid phase.7,9
From the point of view of computational complexity, DBSCAN can be performed in a time (NlogN),12 where N is the number of points, because the core points and the directly density-reachable points associated with them can be identified using, for example, a kd-tree based search that takes (on average) (logN) iterations.13
In Fig. 2 we show the result of DBSCAN filtering applied to a set of points on the x–y plane, whose coordinates are sampled from two uniform random distributions (upper panel), a more dense one in the interval x from 0 to 1, and a less dense one (1/3 relative density) for x from 1 to 2. Along the y axis, all coordinates are sampled from a uniform random distribution from 0 to 1. Periodic boundary conditions are applied along the y axis. For this example, rc = 0.18 and Nc = 24. The DBSCAN filtering (Fig. 2, lower panel) results in the identification of the largest cluster composed of core points (red points) and density-reachable points (black circles), which almost coincide with the set of points sampled at higher density (x < 1). Note that although the algorithm does fairly well in recovering the two initial distributions, it is not expected (and in our case, not meant) to do so: at the interface, x ≃ 1, the juxtaposition of the two distributions creates regions of high local density for x > 1, which are reachable from the main cluster. Conversely, regions of low local density for x < 1 can occur just because of the random nature of the distribution and, if they occur close to x = 1, they can become part of some of the smaller clusters. The importance of considering not only core points (that is, those with a local density higher than the chosen threshold) but also the density-reachable ones can be appreciated by noticing, in Fig. 2, the number of non-core points lying well within the high-density region.
We tested the performances of the algorithm by applying it to similar distributions in the three-dimensional space, by varying the number of points Npoints from 500 to 5 × 106. The time it takes to perform the filtering, shown in Fig. 3 as blue circles, is virtually indistinguishable from a linear function of Npoints (dashed line). The prefactors involved in the calculation are also relatively low, as indicated by the timing of about 1 s for 1 million points.
Fig. 3 Computing time needed to apply the DBSCAN filter to systems with a different number of points, Npoints. |
A common approach in the simple neighbour clustering algorithm is to use as a cutoff radius the distance at which the first minimum of the radial distribution function (of the dense phase) is located. In this way, two molecules are considered to be in the same cluster if they are located at a distance, which would make them (by definition) belong to each other's shell of first neighbours. In the case of the BMIM NTf2 benzene mixture, benzene separates in two regions, where its concentration changes from about 7 to 3 molecules per nm3 (see Fig. 4). The intermolecular radial distribution function for the carbons of benzene rings in the high and low benzene concentration regions are shown in Fig. 5. In the high concentration region, a clear, deep minimum is present at around 0.8 nm, but a less pronounced one appears not far from the close contact distance, at about 0.55 nm. Note that the first peak of the radial distribution function in the high benzene concentration region is lower than the corresponding one in the low benzene concentration region: this could seem counterintuitive at first, but one should not forget that the functions are normalized to their respective bulk concentrations, which have a ratio of 3:7. Therefore, the local concentration is always smaller, in absolute terms, in the low benzene concentration region than in the high one.
Fig. 5 Intermolecular radial distribution function of the carbon atoms (blue curve: high benzene concentration region; orange curve: low benzene concentration region). |
To understand which approach could be best suited for choosing the cutoff density ρc, we calculated the distribution of the number of neighbouring molecules for different cutoff radii, rc, calculated over the whole trajectory and over all benzene molecules in the simulation box. The results are presented in Fig. 6. When using a small cutoff radius, rc = 0.45 nm, the distribution of the number of neighbours Nn shows a shoulder at about Nn = 5, hinting at the presence of an underlying multimodal distribution, which cannot be properly resolved. By increasing the cutoff to rc = 0.6 and 0.75 nm, the distribution stretches to a higher number of neighbours, as the available volume increases, and, at the same time, the presence of two peaks is gradually revealed. The two peaks, corresponding to the density distributions in the two concentration regions, are best resolved when a large cutoff is being used. We have chosen to characterize the distribution using a k-means clustering algorithm14 with two modes, which provides two centroids (represented by two circles in Fig. 6 for rc = 0.75 nm). From the average of the positions of the two centroids (denoted by a diamond symbol in Fig. 6) one can obtain a possible estimate of the threshold density ρc, as below/above the corresponding number of neighbours most of the distribution is contributed to by atoms in the low/high benzene concentration region, respectively.
Due to the non-miscibility of the ionic liquid in benzene, no ionic liquid should be found in the benzene-rich phase determined by the algorithm. This sets a fairly restrictive criterion to be met by the algorithm. However, by using the midpoint of the two centroids, about 5% of the total number of ionic liquid molecules (51 ions out of about 1024 in the system) can be found within the high benzene concentration region, showing that this choice of ρc is not adequate. The reason for this behaviour stems from the fact that by placing the cutoff at the midpoint of the two centroids, some benzene molecules in the right tail of the distribution centred at the lower concentration are identified as belonging to the high concentration region. By using a cutoff concentration corresponding to the upper centroid, instead of the midpoint, the proportion of ions found in the high benzene concentration region drops to about 0.5%. As there are 512 ion pairs in the system, this means that on average about one ion pair (per side) is found across the benzene interface. The choice of the upper centroid concentration as a threshold one does, therefore, improve noticeably the quality of the phase identification. This can be appreciated by inspecting the intrinsic density profile,9 calculated by constructing the histogram of the distance of molecules from the local position of the interface ξ(x, y), as
(1) |
Another important requirement for the algorithm is its stability against small (or large) variation of rc, the only free parameter left (as ρc can be determined automatically). Fig. 8 reports, for the same configuration, but for different values of rc, the molecules belonging to the largest cluster (grey) and to the other, smaller clusters (blue). In addition, we have run an interfacial analysis using the Identification of Truly Interfacial Molecules algorithm (ITIM),15 to inspect also the features of the interface between the high and low benzene concentration regions. The atoms belonging to the interface (but still part of the high benzene concentration region) are represented in red. Visual inspection shows that the DBSCAN-based algorithm succeeds in identifying consistently the two regions for all cutoff values, even when the distributions of neighbours are strongly overlapping (see Fig. 6). For cutoff values lower than 0.5 nm, however, the algorithm seems to identify a larger number of benzene molecules, with some protrusion into the ionic liquid-rich phase noticeable in the left interface.
To quantify the difference between the phases obtained with different cutoff values, as well as the differences between the corresponding interfaces, we calculated the number of molecules belonging to the largest cluster (i.e., the benzene-rich phase) and the interface, as a function of rc, and presented this information in Fig. 9. Two main trends can be seen, namely, a sharp decrease in the number of surface molecules and an increase in the number of molecules in the largest cluster when the cutoff increases from 0.35 to about 0.6 nm. After this point, the number of surface molecules and the size of the biggest cluster change only slightly, and the dependence on the cutoff value suggests the presence of a minimum value for the number of surface molecules. Intuitively, one would consider as an optimal choice for rc the one that minimizes the surface area7 or, in this case, the number of surface molecules, which is roughly proportional to it. In principle, one could point out that from rc = 1 nm up to the largest value rc = 1.45 nm, the number of surface molecules differs, on average, only by about 5 surface molecules (roughly 1% of the total). These small variations are probably not consequential for many practical applications, and any cutoff radius above 1 nm would be an equivalently good choice. One should bear in mind, however, that the larger the cutoff radius, the more computationally intensive the DBSCAN filtering will be.
Fig. 9 Number of surface molecules (orange squares) and molecules belonging to the largest cluster (blue circles) as a function of the cutoff radius rc. |
The solubility of benzene, defined as the ratio of the benzene molar concentration and total molar concentration in the ionic-liquid rich phase, can be accessed either by counting the number of benzene molecules not in the largest cluster, or through a fit in the bulk region of the density profile. The two values (0.70 ± 0.01 and 0.69 ± 0.01, respectively) compare reasonably well with the experimental solubility of benzene in BMIM NTf2, 0.82 measured under the same thermodynamic conditions.10
It is worth mentioning that the algorithm is not restricted to planar phase separations, but can be applied without any modification in the presence of corrugated interfaces, or when one of the phases is dispersed into the other one, such as droplets of oil dispersed in a water/oil mixture, as long as the cutoff radius rc is smaller than that of the droplets.
One might wonder how other similar algorithms compare to the present one in determining the phases of a partially miscible system, or, conversely whether the present approach is appropriate also in the fully demixed case. To test similar algorithms we investigated the BMIM NTf2/benzene system using the ITIM algorithm with simple clustering, and the Generalized ITIM (GITIM) algorithm.16 Simulation snapshots resulting from these tests are shown in the ESI.† In particular, it clearly emerged that the use of a simple clustering scheme does not allow us to recognize properly the two phases, to the extent that it is impossible for the ITIM algorithm to distinguish upper and lower sides of the interface. With the simple clustering scheme, all molecules in the simulation box are assigned to the high-density phase, and those tagged as interfacial are not representative at all of the real surface molecules. The GITIM algorithm performs better, as it is able to reveal the small empty pockets, which are abundant in the low benzene concentration phase. The large majority of benzene molecules in the latter phase, however, are identified as surface molecules, and some additional kind of filtering would be required to properly separate the two phases.
The opposite limit of a well-defined phase separation with vanishing vapour pressure is also instructive. We have considered a water/vapour interface formed by 1000 water molecules at 300 K, simulated using the SPC/E water model,17 and assigned molecules to the two phases using both the simple clustering scheme and the present method. At a temperature of 300 K, in most frames not a single water molecule is clearly seen in the vapour phase, and in these cases the simple clustering approach assigns all water molecules to the liquid phase. The distribution of the number of neighbours, however, is in this case no longer bimodal, which was a prerequisite for using the automatic determination of the threshold density ρc. Setting by hand ρc to, for example, one water molecule every 35 Å3 (therefore slightly lower than the bulk number density), the density based clustering algorithm also assigns all molecules to the liquid phase. If, instead, the upper centroid is used to set ρc, the following is observed: for a cutoff value of 0.8 nm, some molecules in the outer part of the first molecular layers are identified as vapour ones; by increasing the cutoff value the number of molecules recognized as vapour ones decreases, and for cutoff values larger than 1.5 nm, they completely disappear (simulation snapshots of the water/vapour interface are also shown in the ESI†).
We can therefore conclude that in the extreme case when no or very few molecules are present in the low-concentration phase, the automatic threshold determination should be avoided, in favour of either choosing manually the threshold density, or using the simple clustering method. Either way, the same result is obtained.
To model BMIM NTf2, the force field published by Logotheti et al.22 was applied, while benzene was described by the general OPLS force field, including all hydrogen atoms.23 The Lennard-Jones interaction parameters between the ionic liquid and benzene were determined using Lorentz–Berthelot mixing rules. All bond lengths were kept fixed using the LINCS algorithm24 to make sure we can safely use a time step of 2 fs for the integration of the equations of motion. Non-bonded interactions were truncated beyond 1.2 nm, while the electrostatic interaction was calculated using the PME algorithm25 in its smooth variant.26
Analytical tail corrections for the energy and pressure of the dispersion interactions were also applied. Periodic boundary conditions were applied in all directions. The simulated system contained 512 ion pairs as well as 2500 benzene molecules. Starting from totally unmixed phases, an equilibration was performed for 600 ns. This long equilibration time was needed because of the peculiar mixing behaviour of the system (which requires transport of a large amount of mass through the simulation box), combined with the high viscosity of the mixture, which sets the relaxation time. During the 600 ns equilibration, quantities like the total energy and the concentrations in the two phases (that is, also the chemical potential) relaxed to their equilibrium values. After the equilibration, we sampled configurations from an additional 30 ns long run, saving them to disk for further analysis.
The algorithm for the phase determination was implemented in python/Cython,27 only slightly modifying the DBSCAN class of the scikit-learn machine learning package28 to provide also the sizes of clusters and to take into account the presence of periodic boundary conditions. The latter requirement has been fulfilled by using the periodic boundary condition-aware kd-tree neighbour search implemented in recent versions (≥0.18) of scipy. The configurations from the stored trajectories were made accessible to the python script using the MDAnalysis library.29,30
This choice has been dictated by two factors, namely (a) the high miscibility of benzene, which yields equilibrium concentrations in a ratio of about 3:7 and (b) the virtually zero miscibility of the ionic liquid. On the one hand, the high miscibility of benzene provided a challenging test case, as the equilibrium concentrations are very close to each other, and the interface between the high and low benzene concentration regions is difficult to locate. On the other hand, the low miscibility of the ionic liquid has, as a consequence, the presence of a very well defined interface between the ionic liquid-rich and poor regions. This provides a remarkable benchmark for the identification of the interface, as the phases as defined using the ionic liquid or benzene are not supposed to interpenetrate in a consistent scheme. In fact, only about 1 ion pair is found, on average, to cross the interface and be found in the high benzene concentration region. This shows that the algorithm is in fact able to identify a meaningful interface also when the miscibility of the components is very high. The robustness against the only remaining free parameter of the method, and the quasi linear scaling as a function of the number of atoms in the system, therefore make it a viable approach for the identification of phases in multicomponent systems with high miscibilities. The code is available free of charge as part of the pytim package at http://https://github.com/Marcello-Sega/pytim.
Footnote |
† Electronic supplementary information (ESI) available: Phase determination using different algorithms and low vapour pressure. See DOI: 10.1039/c7cp02918g |
This journal is © the Owner Societies 2017 |