Generalizable classification of crystal structure error types using graph attention networks

Abstract

Modern chemical applications of machine learning rely on massive training datasets collected through computational simulations or data mining. The quality of such datasets is increasingly challenged due to the discovery of errors in the most popular crystal structure databases. While methods exist to determine error presence, determining an error's cause is not straightforward. We propose a graph neural network-based approach to classify the presence of crystal structure errors, including proton omissions, charge balancing errors, and crystallographic disorder. A training dataset comprising >11k metal–organic frameworks (MOFs) labelled by error type was generated through domain expert inspection. Chemically intuitive features, such as atomic number and oxidation state, were found to achieve high classification accuracies ranging from 85 to 95%. Despite only training on MOFs, classification was generalizable towards unseen databases of molecules and metal complexes, observing accuracies eclipsing 96% in proton and disorder error classification in random samples of drug molecules and metal complexes. Further, graph explainability analysis indicated that these models frequently identify chemically-problematic subgraph structures—analogous to those a chemist would flag—as important towards the error label prediction.

Graphical abstract: Generalizable classification of crystal structure error types using graph attention networks

Supplementary files

Article information

Article type
Paper
Submitted
05 Jul 2025
Accepted
26 Aug 2025
First published
03 Sep 2025
This article is Open Access
Creative Commons BY-NC license

J. Mater. Chem. A, 2025, Advance Article

Generalizable classification of crystal structure error types using graph attention networks

M. Gibaldi, J. Luo, A. J. White, R. A. Mayo, C. Pereira and T. K. Woo, J. Mater. Chem. A, 2025, Advance Article , DOI: 10.1039/D5TA05426E

This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence. You can use material from this article in other publications, without requesting further permission from the RSC, provided that the correct acknowledgement is given and it is not used for commercial purposes.

To request permission to reproduce material from this article in a commercial publication, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party commercial publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements