Oxidation states in solids from data-driven paradigms
Abstract
The oxidation state (OS) is an essential chemical concept that embodies chemical intuition but cannot be computed with well-defined physical laws. We establish a data-driven paradigm, with its implementation as Tsinghua Oxidation States in Solids (TOSS), to explicitly compute OSs in crystal structures as the emergent properties from large-sized datasets based on Bayesian maximum a posteriori probability (MAP). TOSS employs two looping structures over the large-sized dataset of crystal structures to obtain an emergent library of distance distributions as the foundation for chemically intuitive understanding and then determine the OSs by minimizing a loss function for each structure based on MAP and distance distributions in the whole dataset. We apply TOSS to a dataset of over one million crystal structures, achieving a superior success rate, and use the resulting OS dataset to train a graph convolutional network (GCN) model as an alternative. Both TOSS and the GCN model are benchmarked against a curated ICSD dataset of structures with human-assigned OSs, yielding high accuracies of 96.09% and 97.24%, respectively. We expect TOSS and the ML-model-based alternative to find a wide spectrum of applications, and this work also demonstrates an encouraging example for data-driven paradigms to explicitly compute the chemical intuition for tackling complex problems in chemistry.

Please wait while we load your content...