Issue 12, 2022

In silico active learning for small molecule properties

Abstract

Machine learning (ML) has emerged as a promising technology to accelerate materials discovery. While systematic screening of vast chemical spaces is computationally expensive, ML algorithms offer a directed approach to identifying and testing promising molecular candidates for specific applications. Two significant hurdles towards development of robust ML models are the quality and quantity of existing experimental and ab initio data for training ML models in new chemical spaces. Here we present a reliable, reproducible, and fully automated simulation pipeline that enables users with varied backgrounds to easily generate thermodynamic data. Our atomistic simulation pipeline is GPU accelerated and suitable for high-performance computing (HPC) environments. We validate our pipeline results against dedicated experimental work and existing literature data, then further demonstrate how ML may be employed via an active learning approach to further drive chemical exploration. First, ML models are trained to predict thermodynamic properties for a large set of small molecule candidates. Second, new molecules are picked and simulated based on the model uncertainty and expected model improvement. These additional simulations enhance the predictive capabilities of the model until we are satisfied with the overall prediction capability. We simulate 410 molecules using active learning within the automated simulation pipeline to enumerate properties of interest. Across our set of over 6000 small molecule candidates, our active learning procedure is able to predict monomer properties at error rates which are substantial improvements compared to a random selection baseline. We demonstrate that this approach is capable of reducing the number of completed simulations while simultaneously generating a reliable final model to predict thermodynamic properties for a large number of small molecules.

Graphical abstract: In silico active learning for small molecule properties

Article information

Article type
Paper
Submitted
06 7 2022
Accepted
15 8 2022
First published
02 9 2022

Mol. Syst. Des. Eng., 2022,7, 1611-1621

In silico active learning for small molecule properties

L. Schneider, M. Schwarting, J. Mysona, H. Liang, M. Han, P. M. Rauscher, J. M. Ting, S. Venkatram, R. B. Ross, K. J. Schmidt, B. Blaiszik, I. Foster and J. J. de Pablo, Mol. Syst. Des. Eng., 2022, 7, 1611 DOI: 10.1039/D2ME00137C

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements