Investigating the error imbalance of large-scale machine learning potentials in catalysis

Abstract

Machine learning potentials (MLPs) have greatly accelerated atomistic simulations for material discovery. The Open Catalyst 2020 (OC20) dataset is one of the largest datasets for training MLPs for heterogeneous catalysis. The mean absolute errors (MAE) of the MLPs on the energy target of the dataset have asymptotically approached about 0.2 eV over the past two years with increasingly sophisticated models. The errors were found to be imbalanced between the different material classes with non-metals having the highest errors. In this work, we investigate several potential sources for the imbalanced distribution of errors. We examined material class-specific convergence errors in the density functional theory (DFT) calculations including k-point sampling, plane wave cutoff and smearing width. Significant DFT convergence errors with a mean absolute value of ∼0.15 eV were found on the total energies of non-metals, higher than the other material classes. However, as a result of cancellation of errors, convergence errors on adsorption energies have a mean absolute value of ∼0.05 eV across all material classes. Moreover, we found that the MAEs of the MLPs are not affected by these convergence errors. Second, we show that calculations with surface reconstruction can introduce inconsistencies to the adsorption energy referencing scheme that cannot be fit by the MLPs. Nonmetals and halides were found to have the highest fraction of calculations with surface reconstructions. Removing calculations with surface reconstructions from the validation sets, without re-training, significantly lowers the MAEs by ∼35% and reduces the imbalance of the MAEs. Alternatively, MLPs trained on total energies provide a solution to the surface reconstruction inconsistencies since they eliminate the referencing issue, and have comparable MAEs to MLPs trained on adsorption energies.

Graphical abstract: Investigating the error imbalance of large-scale machine learning potentials in catalysis

Supplementary files

Article information

Article type
Paper
Submitted
13 May 2024
Accepted
29 Jul 2024
First published
14 Aug 2024
This article is Open Access
Creative Commons BY license

Catal. Sci. Technol., 2024, Advance Article

Investigating the error imbalance of large-scale machine learning potentials in catalysis

K. Abdelmaqsoud, M. Shuaibi, A. Kolluru, R. Cheula and J. R. Kitchin, Catal. Sci. Technol., 2024, Advance Article , DOI: 10.1039/D4CY00615A

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements