Learning reduced kinetic Monte Carlo models of complex chemistry from molecular dynamics
We propose a novel statistical learning framework for automatically and efficiently building reduced kinetic Monte Carlo (KMC) models of large-scale elementary reaction networks from data generated by a single or few molecular dynamics simulations (MD). Existing approaches for identifying species and reactions from molecular dynamics typically use bond length and duration criteria, where bond duration is a fixed parameter motivated by an understanding of bond vibrational frequencies. In contrast, we show that for highly reactive systems, bond duration should be a model parameter that is chosen to maximize the predictive power of the resulting statistical model. We demonstrate our method on a high temperature, high pressure system of reacting liquid methane, and show that the learned KMC model is able to extrapolate more than an order of magnitude in time for key molecules. Additionally, our KMC model of elementary reactions enables us to isolate the most important set of reactions governing the behavior of key molecules found in the MD simulation. We develop a new data-driven algorithm to reduce the chemical reaction network which can be solved either as an integer program or efficiently using L1 regularization, and compare our results with simple count-based reduction. For our liquid methane system, we discover that rare reactions do not play a significant role in the system, and find that less than 7% of the approximately 2000 reactions observed from molecular dynamics are necessary to reproduce the molecular concentration over time of methane. The framework described in this work paves the way towards a genomic approach to studying complex chemical systems, where expensive MD simulation data can be reused to contribute to an increasingly large and accurate genome of elementary reactions and rates.