Accelerating high-throughput virtual screening through molecular pool-based active learning

David E. Graff; Eugene I. Shakhnovich; Connor W. Coley

doi:10.1039/D0SC06805E

Accelerating high-throughput virtual screening through molecular pool-based active learning†

David E. Graff,

^a Eugene I. Shakhnovich^a and Connor W. Coley

*^b

Author affiliations

* Corresponding authors

^a Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA, USA

^b Department of Chemical Engineering, MIT, Cambridge, MA, USA
E-mail: ccoley@mit.edu

Abstract

Structure-based virtual screening is an important tool in early stage drug discovery that scores the interactions between a target protein and candidate ligands. As virtual libraries continue to grow (in excess of 10⁸ molecules), so too do the resources necessary to conduct exhaustive virtual screening campaigns on these libraries. However, Bayesian optimization techniques, previously employed in other scientific discovery problems, can aid in their exploration: a surrogate structure–property relationship model trained on the predicted affinities of a subset of the library can be applied to the remaining library members, allowing the least promising compounds to be excluded from evaluation. In this study, we explore the application of these techniques to computational docking datasets and assess the impact of surrogate model architecture, acquisition function, and acquisition batch size on optimization performance. We observe significant reductions in computational costs; for example, using a directed-message passing neural network we can identify 94.8% or 89.3% of the top-50 000 ligands in a 100M member library after testing only 2.4% of candidate ligands using an upper confidence bound or greedy acquisition strategy, respectively. Such model-guided searches mitigate the increasing computational costs of screening increasingly large virtual libraries and can accelerate high-throughput virtual screening campaigns with applications beyond docking.

Supplementary files

Article information

DOI: https://doi.org/10.1039/D0SC06805E
Article type: Edge Article
Submitted: 13 Dec 2020
Accepted: 26 Apr 2021
First published: 29 Apr 2021
This article is Open Access

All publication charges for this article have been paid for by the Royal Society of Chemistry

Download Citation

Chem. Sci., 2021,12, 7866-7881

Permissions

Request permissions

Accelerating high-throughput virtual screening through molecular pool-based active learning

D. E. Graff, E. I. Shakhnovich and C. W. Coley, Chem. Sci., 2021, 12, 7866 DOI: 10.1039/D0SC06805E

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Chemical Science

Accelerating high-throughput virtual screening through molecular pool-based active learning†

Abstract

Supplementary files

Article information

Download Citation

Permissions

Accelerating high-throughput virtual screening through molecular pool-based active learning

Social activity

Search articles by author

Spotlight

Advertisements