DeepMech: A Machine Learning Framework for Chemical Reaction Mechanism Prediction
Abstract
The ability to predict the complete, step-by-step mechanism of chemical reactions from first principles remains a grand challenge in science. Importance of chemical reaction mechanisms (CRMs) pervades almost all domains such as prebiotic chemistry, drug discovery, materials science and so on. Uncovering CRMs remains a complex task, traditionally reliant on expert-driven experiments or expensive quantum chemical computations. While deep learning (DL) studies have shown promise in predicting reaction outcomes, it largely ignored important intermediates and mechanistic steps en route to the product of immediate interest. Since sequence-to-sequence models that generate products character by character are prone to hallucination, we consider it important to develop DL models that prioritize reactivity over learning the syntax and semantics of sequences. The progress in reaction mechanism predictions is further limited by the lack of large-scale, mass-balanced datasets with mechanistic annotations. Motivated by these key lacunae, we introduce DeepMech, an interpretable graph-based DL framework that employs attention mechanisms at both atom and bond levels. Instead of end-to-end learning from reactants to products, our model incorporates a template of mechanistic operation (TMOp) for the generation of intermediates in elementary mechanistic steps. It leverages TMOps, to predict step-by-step mechanisms toward realizing full CRMs for a multitude of reaction classes of high contemporary significance. To train our DeepMech model, first we construct ReactMech, a meticulously curated dataset of about 30K full reaction mechanisms, each comprising of several atom-mapped elementary steps (totaling to 100K). DeepMech achieves the state-of-the-art accuracy of 98.98±0.12% in predicting elementary steps and 95.94±0.21% in complete CRM tasks. The model maintains high fidelity even in out-of-distribution scenarios involving unseen catalysts, ligands, and/or mechanistic classes. In so far as the generalizability goes, DeepMech effectively reconstructs multistep CRMs relevant to prebiotic chemistry, beginning from trivial primordial substrates such as nitrogen, ammonia, methane, water, and hydrogen cyanide, to complex biomolecules like serine and aldopentose. Further, attention-based interpretability analysis reveals that DeepMech correctly identifies reactive atoms and bonds, in line with chemical intuition. Collectively, DeepMech offers a promising step toward data-driven prediction of CRMs, with potential to expedite mechanistic understanding and reaction design across domains.
Please wait while we load your content...