Limitations of machine learning models when predicting compounds with completely new chemistries: possible improvements applied to the discovery of new non-fullerene acceptors

Zhi-Wen Zhao; Marcos del Cueto; Alessandro Troisi

doi:10.1039/D2DD00004K

You do not have JavaScript enabled. Please enable JavaScript to access the full features of the site or access our non-JavaScript page.

Limitations of machine learning models when predicting compounds with completely new chemistries: possible improvements applied to the discovery of new non-fullerene acceptors†

Zhi-Wen Zhao,‡^ab Marcos del Cueto

‡*^a and Alessandro Troisi

Author affiliations

* Corresponding authors

^a Department of Chemistry, University of Liverpool, Liverpool, UK
E-mail: m.del-cueto@liverpool.ac.uk

^b Institute of Functional Material Chemistry, Faculty of Chemistry, Northeast Normal University, Changchun, Jilin, P. R. China

Abstract

We try to determine if machine learning (ML) methods, applied to the discovery of new materials on the basis of existing data sets, have the power to predict completely new classes of compounds (extrapolating) or perform well only when interpolating between known materials. We introduce the leave-one-group-out cross-validation, in which the ML model is trained to explicitly perform extrapolations of unseen chemical families. This approach can be used across materials science and chemistry problems to improve the added value of ML predictions, instead of using extrapolative ML models that were trained with a regular cross-validation. We consider as a case study the problem of the discovery of non-fullerene acceptors because novel classes of acceptors are naturally classified into distinct chemical families. We show that conventional ML methods are not useful in practice when attempting to predict the efficiency of a completely novel class of materials. The approach proposed in this work increases the accuracy of the predictions to enable at least the categorization of materials with a performance above and below the median value.

This article is part of the themed collection: Celebrating our 2024 Prizewinners

Download options Please wait...

Supplementary files

Supplementary information PDF (1284K)

Article information

DOI: https://doi.org/10.1039/D2DD00004K
Article type: Paper
Submitted: 17 Feb 2022
Accepted: 23 Mar 2022
First published: 25 Mar 2022
This article is Open Access

Download Citation

Digital Discovery, 2022,1, 266-276

Permissions

Request permissions

Limitations of machine learning models when predicting compounds with completely new chemistries: possible improvements applied to the discovery of new non-fullerene acceptors

Z. Zhao, M. del Cueto and A. Troisi, Digital Discovery, 2022, 1, 266 DOI: 10.1039/D2DD00004K

This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence. You can use material from this article in other publications, without requesting further permission from the RSC, provided that the correct acknowledgement is given and it is not used for commercial purposes.

To request permission to reproduce material from this article in a commercial publication, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party commercial publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Social activity

Fetching data from CrossRef.
This may take some time to load.

Digital Discovery

Limitations of machine learning models when predicting compounds with completely new chemistries: possible improvements applied to the discovery of new non-fullerene acceptors†

Abstract

Supplementary files

Article information

Download Citation

Permissions

Limitations of machine learning models when predicting compounds with completely new chemistries: possible improvements applied to the discovery of new non-fullerene acceptors

Social activity

Search articles by author

Spotlight

Advertisements