Issue 26, 2021

Determining usefulness of machine learning in materials discovery using simulated research landscapes

Abstract

When existing experimental data are combined with machine learning (ML) to predict the performance of new materials, the data acquisition bias determines ML usefulness and the prediction accuracy. In this context, the following two conditions are highly common: (i) constructing new unbiased data sets is too expensive and the global knowledge effectively does not change by performing a limited number of novel measurements; (ii) the performance of the material depends on a limited number of physical parameters, much smaller than the range of variables that can be changed, albeit such parameters are unknown or not measurable. To determine the usefulness of ML under these conditions, we introduce the concept of simulated research landscapes, which describe how datasets of arbitrary complexity evolve over time. Simulated research landscapes allow us to use different discovery strategies to compare standard materials exploration with ML-guided explorations, i.e. we can measure quantitatively the benefit of using a specific ML model. We show that there is a window of opportunity to obtain a significant benefit from ML-guided strategies. The adoption of ML can take place too soon (not enough information to find patterns) or too late (dense datasets only allow for negligible ML benefit), and the adoption of ML can even slow down the discovery process in some cases. We offer a qualitative guide on when ML can accelerate the discovery of new best-performing materials in a field under specific conditions. The answer in each case depends on factors like data dimensionality, corrugation and data collection strategy. We consider how these factors may affect the ML prediction capabilities and discuss some general trends.

Graphical abstract: Determining usefulness of machine learning in materials discovery using simulated research landscapes

Supplementary files

Article information

Article type
Paper
Submitted
22 ⵉⴱⵔ 2021
Accepted
27 ⵎⴰⵢ 2021
First published
28 ⵎⴰⵢ 2021

Phys. Chem. Chem. Phys., 2021,23, 14156-14163

Determining usefulness of machine learning in materials discovery using simulated research landscapes

M. del Cueto and A. Troisi, Phys. Chem. Chem. Phys., 2021, 23, 14156 DOI: 10.1039/D1CP01761F

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements