In silico active learning for small molecule properties
Abstract
Machine learning (ML) has emerged as a promising technology to accelerate materials discovery. While systematic screening of vast chemical spaces is computationally expensive, ML algorithms offer a directed approach to identifying and testing promising molecular candidates for specific applications. Two significant hurdles towards development of robust ML models are the quality and quantity of existing experimental and ab initio data for training ML models in new chemical spaces. Here we present a reliable, reproducible, and fully automated simulation pipeline that enables users with varied backgrounds to easily generate thermodynamic data. Our atomistic simulation pipeline is GPU accelerated and suitable for high-performance computing (HPC) environments. We validate our pipeline results against dedicated experimental work and existing literature data, then further demonstrate how ML may be employed via an active learning approach to further drive chemical exploration. First, ML models are trained to predict thermodynamic properties for a large set of small molecule candidates. Second, new molecules are picked and simulated based on the model uncertainty and expected model improvement. These additional simulations enhance the predictive capabilities of the model until we are satisfied with the overall prediction capability. We simulate 410 molecules using active learning within the automated simulation pipeline to enumerate properties of interest. Across our set of over 6000 small molecule candidates, our active learning procedure is able to predict monomer properties at error rates which are substantial improvements compared to a random selection baseline. We demonstrate that this approach is capable of reducing the number of completed simulations while simultaneously generating a reliable final model to predict thermodynamic properties for a large number of small molecules.
- This article is part of the themed collections: Celebrating Latin American Chemistry and MSDE Recent HOT Articles