Determining usefulness of machine learning in materials discovery using simulated research landscapes

Marcos del Cueto; Alessandro Troisi

doi:10.1039/D1CP01761F

Determining usefulness of machine learning in materials discovery using simulated research landscapes†

Marcos del Cueto

*^a and Alessandro Troisi

*^a

Author affiliations

* Corresponding authors

^a Department of Chemistry, University of Liverpool, Liverpool, UK
E-mail: m.del-cueto@liverpool.ac.uk, a.troisi@liverpool.ac.uk

Abstract

When existing experimental data are combined with machine learning (ML) to predict the performance of new materials, the data acquisition bias determines ML usefulness and the prediction accuracy. In this context, the following two conditions are highly common: (i) constructing new unbiased data sets is too expensive and the global knowledge effectively does not change by performing a limited number of novel measurements; (ii) the performance of the material depends on a limited number of physical parameters, much smaller than the range of variables that can be changed, albeit such parameters are unknown or not measurable. To determine the usefulness of ML under these conditions, we introduce the concept of simulated research landscapes, which describe how datasets of arbitrary complexity evolve over time. Simulated research landscapes allow us to use different discovery strategies to compare standard materials exploration with ML-guided explorations, i.e. we can measure quantitatively the benefit of using a specific ML model. We show that there is a window of opportunity to obtain a significant benefit from ML-guided strategies. The adoption of ML can take place too soon (not enough information to find patterns) or too late (dense datasets only allow for negligible ML benefit), and the adoption of ML can even slow down the discovery process in some cases. We offer a qualitative guide on when ML can accelerate the discovery of new best-performing materials in a field under specific conditions. The answer in each case depends on factors like data dimensionality, corrugation and data collection strategy. We consider how these factors may affect the ML prediction capabilities and discuss some general trends.

This article is part of the themed collections: 2021 PCCP HOT Articles and Emerging AI Approaches in Physical Chemistry

Supplementary files

Article information

DOI: https://doi.org/10.1039/D1CP01761F
Article type: Paper
Submitted: 22 Apr 2021
Accepted: 27 May 2021
First published: 28 May 2021

Download Citation

Phys. Chem. Chem. Phys., 2021,23, 14156-14163

Permissions

Request permissions

Determining usefulness of machine learning in materials discovery using simulated research landscapes

M. del Cueto and A. Troisi, Phys. Chem. Chem. Phys., 2021, 23, 14156 DOI: 10.1039/D1CP01761F

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Physical Chemistry Chemical Physics

Determining usefulness of machine learning in materials discovery using simulated research landscapes†

Abstract

Supplementary files

Article information

Download Citation

Permissions

Determining usefulness of machine learning in materials discovery using simulated research landscapes

Social activity

Search articles by author

Spotlight

Advertisements