Nucleation rate analysis of methane hydrate from molecular dynamics simulations

Clathrate hydrates are solid crystalline structures most commonly formed from solutions that have nucleated to form a mixed solid composed of water and gas. Understanding the mechanism of clathrate hydrate nucleation is essential to grasp the fundamental chemistry of these complex structures and their applications. Molecular dynamics (MD) simulation is an ideal method to study nucleation at the molecular level because the size of the critical nucleus and formation rate occur on the nano scale. Various analysis methods for nucleation have been developed through MD to analyze nucleation. In particular, the mean ﬁ rst-passage time (MFPT) and survival probability (SP) methods have proven to be e ﬀ ective in procuring the nucleation rate and critical nucleus size for monatomic systems. This study assesses the MFPT and SP methods, previously used for monatomic systems, when applied to analyzing clathrate hydrate nucleation. Because clathrate hydrate nucleation is relatively di ﬃ cult to observe in MD simulations (due to its high free energy barrier), these methods have yet to be applied to clathrate hydrate systems. In this study, we have analyzed the nucleation rate and critical nucleus size of methane hydrate using MFPT and SP methods from data generated by MD simulations at 255 K and 50 MPa. MFPT was modi ﬁ ed for clathrate hydrate from the original version by adding the maximum likelihood estimate and growth e ﬀ ect term. The nucleation rates calculated by MFPT and SP methods are within 5%, and the critical nucleus size estimated by the MFPT method was 50% higher, than values obtained through other more rigorous but computationally expensive estimates. These methods can also be extended to the analysis of other clathrate hydrates.


Introduction
Clathrate hydrates are ice-like structures in which guest molecules are trapped inside water cages connected by a hydrogen-bonded network. 1 The formation of clathrate hydrates of natural gases, also called gas hydrates, is a serious problem in the ow assurance of oil/gas ow lines. Inhibiting and mitigating hydrate formation in ow lines are crucial for the safety and the reduction of operating costs of maintaining ow lines. 1,2 Gas hydrates are also abundant in the seaoor and have attracted attention as a potential energy resource. 3 Depressurization of these hydrate deposits is projected to be an efficient method to produce natural gas from the hydrates in the sea. 4 To depressurize the hydrate reservoir, it is important to induce dissociation to release the gas from the hydrates, which will consequently generate thermodynamically favorable conditions for hydrate reformation. Therefore, understanding hydrate formation is required to develop efficient energy production strategies. At the most fundamental level, the mechanism of hydrate formation must be understood, since the incipient hydrate crystallization phenomenon can be controlled in ow lines during the production of oil/gas and for the gathering of gas from hydrate reservoirs. Moreover, a thorough understanding of the hydrate formation process will improve the efficiency of other challenging hydrate applications such as gas transport, 2,3 hydrogen storage, 5 and capture/sequestration of carbon dioxide. 6 Nucleation is the rst stage in hydrate formation, which is generally an activated process where small clusters of the new phase are formed from a supersaturated phase. A free energy barrier exists between the two phases, and small clusters which exceed the critical size located at the peak of the barrier become the nucleus of the new phase. 7 Hydrate growth can be observed experimentally, 8 however, hydrate nucleation cannot be observed, as it is a molecular level process that involves nuclei size and time scales that are on the nano scale. Molecular dynamics (MD) simulation has proven to be an invaluable tool to observe hydrate nucleation. [9][10][11] There have been numerous studies on hydrate nucleation, [12][13][14] growth, [15][16][17] and stability 18-20 by simulations. Previous studies have thoroughly examined the formation mechanisms of various hydrate structures. [21][22][23][24][25][26] Order parameters that characterize the hydrate structures formed in the nucleation process have also been developed. [27][28][29][30][31] Nonetheless, the analysis of hydrate nucleation is still in the early stages, and there are remarkably scarce reports in which the nucleation rate and the critical nucleus size are discussed. 32,33 Nucleation is a stochastic process and commonly considered to be a "rare event" in molecular simulations. 34 Direct molecular simulations typically require long calculation times (100s nanoseconds to microseconds) to observe hydrate nucleation. But more importantly, the critical nucleus size of hydrate formation, which is essential in calculating the nucleation rate, cannot be found beforehand. Although there are many difficulties in the simulations of hydrate nucleation, Walsh et al. succeeded in observing hydrate nucleation in microsecond simulations. 12 Recently, Barnes et al. performed a number of similar simulations using high-performance computing and analyzed the nucleation rate and critical nucleus size. 35 Nucleation rate analyses using MD simulations have actually been well documented for monatomic systems forming droplets or bubbles through homogeneous and heterogeneous nucleation. [36][37][38][39][40][41][42][43][44][45] This study is initiated from the assumption that the various methods to analyze the nucleation rate and critical nucleus size may be applicable for other nucleation processes, and therefore we have used these methods to analyze hydrate nucleation. [46][47][48][49] In this study, we analyzed the simulation results from Barnes et al. 35 and calculated the nucleation rate and critical nucleus size of the methane hydrates by implementing methods originally applied to analyze vapor-to-liquid nucleation. This work introduces a methodology for analyzing methane hydrate nucleation but the ndings are easily applicable to other complex clathrate hydrate structures.

Simulation details
Barnes et al. performed 200 MD simulations of methane hydrate nucleation at T ¼ 255 K, P ¼ 50 MPa. 35 Simulation cells included 2944 water and 512 methane molecules with a cylindrical water/methane interface. The initial conguration was created by melting 64 unit cells of structure I hydrate at T ¼ 550 K. 12 Water and methane models were TIP4P/Ice 50 and the unied atom model, 51 respectively, and the Lorentz-Berthelot combining rules were used to calculate water-methane interactions. GROMACS 4.5 and 4.6 were used to perform the simulations, using the Verlet leapfrog algorithm for time integration. 52 The isobaric-isothermal ensemble was applied, where the pressure was controlled by the Parrinello-Rahman barostat 53 with a time constant of 4 ps, and the temperature was controlled by the Nosé-Hoover thermostat 54,55 with a time constant of 2 ps. The SETTLE algorithm was used to constrain the bond lengths and angles of water molecules. 56 A time step of 2 fs was used, with short-range interactions truncated at 1 nm, and the long-range electrostatic interactions were calculated by the particle mesh Ewald algorithm 57,58 with a Fourier spacing of 0.12 nm. Over 90% of the simulations were performed for a minimum of 3 ms.

Analysis of nucleation
An activated process typied by nucleation is the formation of small embryos of a new phase from an existing metastable phase by overcoming a free energy barrier. 7 The rate of which the critical-sized embryos are formed during the nucleation process is the nucleation rate. The nucleation process is considered to be a diffusion process over a barrier in an internal space, which can be described by the Fokker-Planck equation in terms of a variable X that is an internal coordinate or degree of freedom, where r is the density in the internal space, and D and A are the diffusion and dri coefficients in this space, respectively. In the nucleation process, eqn (1) can be expressed as a function of a variable n, which is the number of molecules constituting a cluster, instead of X. In this case, r indicates the cluster size distribution, and eqn (1) can be expressed in the form of a continuity equation in the internal space, v vt rðn; tÞ ¼ À v vn jðn; tÞ; where r(n,t) is the number density of clusters containing n monomers at time t and j(n,t) is the formation rate of size n clusters in the system. At steady state, vr(n,t)/vt ¼ 0 and j(n,t) is constant, so the nucleation rate at steady state can be dened as J ¼ j/V, where V is the volume of the system. The nucleation rate can also be described by means of integrating eqn (2) with respect to n, vN(n t ,t)/vt ¼ j(n t ,t)$ N(n t ,t) is the total number of clusters larger than a threshold size of n t . The nucleation rate obtained by this expression is In this equation, the nucleation rate is described by a time derivative of the number of clusters greater than n t per unit volume. This rate should be constant at steady state and independent of n t , so n t must be greater than the critical size. In this work, the nucleation rate was evaluated by this denition based on eqn (3).
2.2.1 Denition of clusters. The denition of cluster size is essential in the analysis of nucleation. Recent hydrate nucleation studies indicate that amorphous structures are initially formed and these may anneal and crystallize. 59,60 Nucleation and growth of hydrates are usually characterized by order parameters that distinguish the phase of water molecules, structure of cages, and coordinates of guest molecules. [27][28][29][30][31]61 However, it is difficult to identify clusters that have formed initially using these order parameters due to the complex molecular geometries. Most recently, Barnes et al. developed an order parameter, called the Mutually Coordinated Guest (MCG) order parameter, that identies guest molecules separated by water clusters consisting of ve or six-member rings. 62 This order parameter (OP) can estimate the cluster size of methane hydrates sufficiently, so the MCG-1 OP from the MCG algorithm was used as the cluster size n in this work.

Mean rst-passage time method (MFPT).
This method is suitable to analyze nucleation in MD simulations. The method makes the simplifying assumption that the free energy barrier is high, and only the height and curvature of the barrier control the nucleation kinetics, and that growth is much faster than nucleation. 48,49 MFPT does not require large systems for the simulation, but rather demands numerous replications of small nucleating systems to analyze the statistics of the phenomenon. Full reaction coordinate analysis of methane hydrate nucleation requires enormous calculation time, so MFPT is advantageous in analyzing this event because it only uses the trajectories of direct MD simulations of nucleating systems.
The mean rst-passage time in the case of nucleation is dened as the mean time s(n) that the largest cluster in each system requires to reach or exceed a threshold size n for the rst time. If the free energy barrier is high enough, MFPT as a function of the cluster size n can be described by a specic sigmoidal curve where s J is the nucleation time, erf(x) is the error function, and Z is the Zeldovich factor. A tting of the simulation results to eqn (4) directly yields the nucleation time s J , critical nucleus size n*, and the Zeldovich factor. In the case of a high free energy barrier, the MFPT curve has a clear plateau at the end of the sigmoidal shape, which indicates the nucleation time s J . The critical nucleus size is considered to be the size at time s J /2 in the MFPT curve because the probability of the transition at the top of the barrier is 50%. The nucleation rate J is calculated from the nucleation time s J as 2.2.3 Survival probability. The formation of a large enough post-critical cluster in the presence of a high free energy barrier is a random event, which follows the Poisson distribution where P k (t) is the probability that k clusters larger than n appear at time t, and s n is the average time when these clusters appear. In the case of k ¼ 0, eqn (6) shows the survival probability (SP) P 0 (t), 8,47,49 i.e., the probability that there are no clusters larger than n aer a time t in the system, to be P 0 (t) ¼ e Àt/s n .
On the other hand, the nucleation probability P nuc (t) is given by where N nuc is the number of systems in which nucleation is observed and N all is the number of all simulation systems. From eqn (7) and (8), the SP P surv (t) becomes where t 0 is the fastest time a cluster takes to reach the threshold size n among all simulations. If the free energy barrier is high, these probabilities do not depend on the threshold size of cluster n. A tting of the simulation results to eqn (9) yields the nucleation time s J , and the nucleation rate can be obtained with eqn (5).

Results and discussion
Nucleation and growth of methane hydrate was observed in 46 out of the 200 replications of the MD simulation trajectories. We analyze the nucleation rate from this set using the MFPT and SP, and the critical nucleus size is also calculated from MFPT.

Mean rst-passage time
The MFPT of each cluster size n (up to n ¼ 400, corresponding to near complete solidication of the system) was calculated and plotted in Fig. 1. The statistics for small n are better than those for large n. This is mainly due to the observation time for small clusters being relatively short (even though this "short" time consumed massive computational resources) compared to those needed to fully observe complete nucleation. The ratio of nucleated to non-nucleated systems most likely inuences the calculation of MFPT, so s is calculated by a supplemental equation, which is based on the maximum likelihood estimate, 32,63 where N R (¼ 46) is the number of reacted (clusters reaching or exceeding a particular size n) trajectories and N N R (¼ 154) is the number of remaining trajectories. s i is the nucleation time, and s k is the total simulation time for nonnucleating trajectories. The nucleation time s J and critical nucleus size n* are obtained by tting the data to eqn (4). The nucleation rate J is calculated by plugging the obtained values into eqn (5) with V ¼ 84.9 nm 3 (estimated by considering the volume of the aqueous phase in a non-nucleating trajectory À volume of bulk methane phase subtracted from total volume). The nucleation rate from Fig. 1 is J ¼ 8.61 Â 10 23 cm À3 s À1 , and the critical nucleus size is n* ¼ 25.9. The tting of the MFPT curve from eqn (4) is poor in the range of 40 # n # 120. Under the thermodynamic conditions considered, the free energy barrier is relatively high, so it is difficult to observe nucleation (only about a quarter of replications nucleated). In systems where nucleation solely occurs, the plateau is expected to appear right aer the sigmoidal curve. 48 Compared to nucleation, the time scale of growth is commonly much shorter, but in our case, there seems to be an overlap of the time scales in the nucleation and growth processes, which can affect the MFPT results. Therefore, a modication to the MFPT curve is applied as suggested by Yi et al. by adding an additional term to eqn (4) to account for nite growth rates of post-critical clusters, with 64  sðnÞ ¼ 0:5s J Â 1 þ erf À Z ffiffiffi ffi p p ðn À n*Þ ÁÃ þ G À1 ðn À n*ÞHðn À n*Þ; (11) where G is the growth rate and H(X) is the Heaviside function. The Heaviside function becomes effective when a cluster size exceeds the critical nucleus size, i.e., when growth occurs aer nucleation. The smooth approximation of the Heaviside function can also be presented by an error function, transforming eqn (11) into ðn À n*Þ ÁÃ þ 0:5G À1 ðn À n*Þ½1 þ erfðCðn À n*ÞÞ; (12) where C is required to be a large positive number. Fig. 2 shows the tting given by eqn (12). The plateau in Fig. 1 becomes a positive slope that incorporates the growth contribution that is absent in Fig. 1. The newly calculated nucleation rate is J ¼ 9.43 Â 10 23 cm À3 s À1 , and the critical nucleus size becomes n* ¼ 23.8. The variation in the nucleation rate is greater than that of the critical nucleus size without growth.
To analyze how the number of replications inuences the results when using the MFPT method, a sensitivity analysis is conducted. Subsets of replications are randomly taken from the total of 200 and the nucleation times are estimated by the maximum likelihood method. The MFPT curve is generated from the averages of the subsets. Table 1 shows the averaged results of MFPT for different numbers of subsets. The results show that the differences in both the nucleation rate and critical nucleus are within 20%, which is an insignicant variation for typical nucleation studies. 46,48,49 Despite the variation, a greater number of replications are expected to produce more accurate results.
As shown in Table 1, from all 200 replications, the nucleation rate and critical nucleus size are estimated as 9.43 Â 10 23 cm À3 s À1 and n* ¼ 23.8, respectively. Nucleation in the monatomic systems can be observed relatively quickly, so the MFPT curve has a clear plateau right aer the sigmoidal shape. 48,49 On the other hand, hydrate nucleation required a long calculation time due to a high free energy barrier, so nucleation may not be observed for many systems within the limited calculation time. Furthermore, hydrate formation is a complex process and growth just aer the nucleation process tend to occur at the same time scale of nucleation. Therefore, MFPT for methane hydrate was modied from its original function using the maximum likelihood estimate by adding the growth term. The nucleation rate result is within 5% of that obtained by Barnes et al., 35 though the critical nucleus size is larger by 50%. The critical nucleus size obtained by Barnes et al. was estimated from a p b histogram test and this seems to have a higher variance. 35,[65][66][67] However, MFPT is a function of the nucleus size n only, so there are limitations in fully capturing the complexities of hydrate nucleation. The source of the discrepancy in the critical nucleus size also may be from the aforementioned comparable nucleation and growth time scales as well as the diffusion kinetics depending on the non-classical shape of the free energy curve. 35 Although further studies are required to clarify the discrepancy in the nucleus size, the agreement in the nucleation rate conrms the effectiveness of the MFPT analysis method for a qualitative estimate without further free energy computations.

Survival probability
The survival probability is calculated by eqn (9) and the results are shown in Fig. 3. Unlike completely nucleated systems (n t ¼ 400) that had 46 out of 200 occurrences, n t ¼ 40 had 49, and the range of the survival probability is within 0.76 # P surv (t) # 1.00. The systems that nucleated are found to grow and not dissociate. If the threshold size is increased to n t ¼ 80, the number of nuclei converges to 46. In other words, three replications did not have a nucleus that reached 80 in size within the simulation time. The variation in the threshold size is analyzed and is found to have little effect on the results of SP. The tting line in Fig. 3 illustrates SP at a longer time scale, and presents the asymptotic tendency at innite time. Based on Fig. 3, one can see that if the simulations of this study are performed to around 70 ms, the SP will become close to zero, meaning nearly all systems will have likely nucleated. The inset in Fig. 3 portrays a single logarithmic transformation from the original SP plot, and the nucleation rate is calculated from the slope of this graph. The nucleation rate is J ¼ 9.31 Â 10 23 cm À3 s À1 , which is close to that obtained by MFPT. SP is commonly independent of the threshold cluster size in the case that the free energy barrier is high. We veried this by changing the threshold cluster size, and Table 2 contains the results. The difference among the values is around 5%, which is remarkable considering the deviation in typical nucleation studies. The nucleation rate will slightly decrease with an increasing threshold size because the number of nucleated systems decreases. The rate at n t ¼ 20, however, was the smallest. The reason for this is from the fact that 20 is around the critical nucleus size obtained by Barnes et al. and MFPT, hence the inconsistency. 49 Therefore, when using the SP method, the threshold size should be set by an adequately larger number than the critical nucleus size (n t $ 40 was sufficient in this study). On the other hand, if the threshold size is too large, the analysis range may spill over to the growth stage rather than staying purely in the nucleation stage, which is the primary target.
The nucleation rate calculated by Walsh et al. is from the maximum likelihood estimate. 31 They performed the six replications at T ¼ 250 K, P ¼ 50 MPa and the induction times were veried by analyzing the evolution of the global F 4 61 order parameter as well as the appearance of cages larger than seven through the MD trajectories. The nucleation rate from Walsh et al. is around J sim ¼ 5.00 Â 10 24 cm À3 s À1 . Their results are one order of magnitude greater than our results from MFPT and SP. This difference may be attributed from the small number of samples and the denition of induction time. On the other hand, the nucleation rate calculated by Barnes et al. is from the maximum likelihood estimate based on prior knowledge of the critical nucleus size. 32,35 The nucleation rate from Barnes et al. is J sim ¼ 9.07 Â 10 23 cm À3 s À1 , which is very close to our results from MFPT and SP. The biggest advantage of this study compared to previous studies is that Table 2 The SP of the various threshold cluster sizes. n t is the threshold size, N nuc is the number of nucleated systems, and J is the nucleation rate the MFPT and SP methods can generate the nucleation rate directly from the MD trajectories.

Conclusion
The nucleation of methane hydrate is a phenomenon that is complicated to analyze at the molecular level due to a high free energy barrier. Recently, this process has been observed by molecular dynamics simulations using high performance computing. In this study, we analyzed the nucleation rate and critical nucleus size of methane hydrate using MFPT and SP, and assessed the applicability of these methods for methane hydrate nucleation analysis. In this study, MFPT was modied from its original function using the maximum likelihood estimate and adding the growth term. The nucleation rates obtained by MFPT and SP are in good agreement (within 20%) and these results are also close to the rate Barnes et al. estimated using direct calculations based on the maximum likelihood estimate. MFPT and SP are convenient methods to calculate the methane hydrate nucleation rate since they only require simulation trajectories. The critical nucleus size was also calculated by MFPT, which was found to be larger than that of Barnes et al. based on the p b histogram test. This difference is likely to come from the non-classical shape of the free energy surface and the methane hydrate system having comparable nucleation and growth time scales, which will inuence the critical nucleus size generated from MFPT. However, a modication for growth for MFPT slightly improves the t of the simple tting function. Though the MFPT and SP methods are only applied for the estimation of the nucleation rate and critical nucleus size of methane hydrate, the methods can easily be extended to the analysis of other clathrate hydrates.