MoleculeNet: a benchmark for molecular machine learning

Zhenqin Wu; Bharath Ramsundar; Evan N. Feinberg; Joseph Gomes; Caleb Geniesse; Aneesh S. Pappu; Karl Leswing; Vijay Pande

doi:10.1039/C7SC02664A

MoleculeNet: a benchmark for molecular machine learning†

Zhenqin Wu,

‡^a Bharath Ramsundar,‡^b Evan N. Feinberg,§^c Joseph Gomes,

§^a Caleb Geniesse,^c Aneesh S. Pappu,^b Karl Leswing^d and Vijay Pande*^a

Author affiliations

* Corresponding authors

^a Department of Chemistry, Stanford University, Stanford, CA 94305, USA
E-mail: pande@stanford.edu

^b Department of Computer Science, Stanford University, Stanford, CA 94305, USA

^c Program in Biophysics, Stanford School of Medicine, Stanford, CA 94305, USA

^d Schrodinger Inc., USA

Abstract

Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm.

This article is part of the themed collections: Most popular 2018-2019 physical and theoretical chemistry articles, 2019 International Open Access Week Collection and 2018 International Open Access Week Collection

Supplementary files

Article information

DOI: https://doi.org/10.1039/C7SC02664A
Article type: Edge Article
Submitted: 15 Jun 2017
Accepted: 30 Oct 2017
First published: 31 Oct 2017
This article is Open Access

All publication charges for this article have been paid for by the Royal Society of Chemistry

Download Citation

Chem. Sci., 2018,9, 513-530

Permissions

Request permissions

MoleculeNet: a benchmark for molecular machine learning

Z. Wu, B. Ramsundar, Evan N. Feinberg, J. Gomes, C. Geniesse, A. S. Pappu, K. Leswing and V. Pande, Chem. Sci., 2018, 9, 513 DOI: 10.1039/C7SC02664A

This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence. You can use material from this article in other publications, without requesting further permission from the RSC, provided that the correct acknowledgement is given and it is not used for commercial purposes.

To request permission to reproduce material from this article in a commercial publication, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party commercial publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Chemical Science

MoleculeNet: a benchmark for molecular machine learning†

Abstract

Supplementary files

Article information

Download Citation

Permissions

MoleculeNet: a benchmark for molecular machine learning

Social activity

Search articles by author

Spotlight

Advertisements