Analysis of uncertainty of neural fingerprint-based models

Christian W. Feldmann; Jochen Sieg; Miriam Mathea

doi:10.1039/D4FD00095A

Analysis of uncertainty of neural fingerprint-based models†

Christian W. Feldmann,

‡^a Jochen Sieg

‡^a and Miriam Mathea

‡*^a

Author affiliations

* Corresponding authors

^a BASF SE, Ludwigshafen, Germany
E-mail: miriam.mathea@basf.com

Abstract

Machine learning has gained popularity for predicting molecular properties based on molecular structure. This study explores the uncertainty estimates of neural fingerprint-based models by comparing pure graph neural networks (GNN) to classical machine learning algorithms combined with neural fingerprints. We investigate the advantage of extracting the neural fingerprint from the GNN and integrating it into a method known for producing better-calibrated probability estimates. Comparisons are made using three classical machine learning methods and the Chemprop model, considering different molecular representations and calibration techniques. We utilize 19 datasets from Toxcast, reflecting real-world scenarios with balanced accuracies ranging from 0.6 to 0.8. Results demonstrate that neural fingerprints combined with classical machine learning methods exhibit a slight decrease in prediction performance compared to the native Chemprop model. However, these models provide significantly improved uncertainty estimates. Notably, uncertainty estimates of neural fingerprint-based methods remain relatively robust for molecules dissimilar to the training set. This suggests that methods like random forest with neural fingerprints can deliver strong prediction performance and reliable uncertainty estimates. When considering both performance and uncertainty, the calibrated Chemprop model and the combination of neural fingerprints with random forest or support vector classifier (SVC) yield comparable results. Surprisingly, the SVC method shows promising performance when combined with neural or count fingerprints. These findings are particularly relevant in real-world industrial projects where accurate predictions and reliable uncertainty estimates are crucial.

This article is part of the themed collection: Data-driven discovery in the chemical sciences

Supplementary files

Article information

DOI: https://doi.org/10.1039/D4FD00095A
Article type: Paper
Submitted: 08 mei 2024
Accepted: 29 jul 2024
First published: 25 sep 2024

Download Citation

Faraday Discuss., 2024, Advance Article

Permissions

Request permissions

Analysis of uncertainty of neural fingerprint-based models

C. W. Feldmann, J. Sieg and M. Mathea, Faraday Discuss., 2024, Advance Article , DOI: 10.1039/D4FD00095A

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Faraday Discussions

Analysis of uncertainty of neural fingerprint-based models†

Abstract

Supplementary files

Article information

Download Citation

Permissions

Analysis of uncertainty of neural fingerprint-based models

Social activity

Search articles by author

Spotlight

Advertisements