Active learning meets metadynamics: automated workflow for reactive machine learning interatomic potentials
Abstract
Atomistic simulations driven by machine-learned interatomic potentials (MLIPs) are a cost-effective alternative to ab initio molecular dynamics (AIMD). Yet, their broad applicability in reaction modelling remains hindered, in part, by the need for large training datasets that adequately sample the relevant potential energy surface, including high-energy transition state (TS) regions. To optimise dataset generation and extend the use of MLIPs for reaction modelling, we present a data-efficient and fully automated workflow for MLIP training that requires only a small number (typically five to ten) of initial configurations and no prior knowledge of the TS. The approach combines automated active learning with well-tempered metadynamics to iteratively and selectively explore chemically relevant regions of configuration space. Using data-efficient architectures, such as the linear Atomic Cluster Expansion, we illustrate the performance of this strategy in various organic reactions where the environment is described at different levels, including the SN2 reaction between fluoride and chloromethane in implicit water, the methyl shift of 2,2-dimethylisoindene in the gas phase, and a glycosylation reaction in explicit dichloromethane solution, where competitive pathways exist. The proposed training strategy yields accurate and stable MLIPs for all three cases, highlighting its versatility for modelling reactive processes.
- This article is part of the themed collection: 2025 Digital Discovery Emerging Investigators

Please wait while we load your content...