Open Access Article
This Open Access Article is licensed under a
Creative Commons Attribution 3.0 Unported Licence

Continued challenges in high-throughput materials predictions: MatterGen predicts compounds from the training dataset

Mikkel Juelsholt*ab
aDepartment of Chemistry, Humboldt Universität zu Berlin, Brook-Taylor-Str.2, 12489, Berlin, Germany. E-mail: mikkel.juelsholt@hu-berlin.de
bDepartment of Heterogeneous Catalysis, Max-Planck-Institut für Kohlenforschung, Kaiser-Wilhelm-Platz 1, 45470, Mülheim an der Ruhr, Germany

Received 12th February 2026 , Accepted 8th April 2026

First published on 20th April 2026


Abstract

High-throughput computational tools and generative AI models aim to revolutionise materials discovery by enabling the rapid prediction of novel inorganic compounds. However, these tools face persistent challenges with modelling compounds where multiple elements occupy the same crystallographic site, often leading to misclassification of known disordered phases as new ordered compounds. Recently, Microsoft revealed MatterGen as a tool for predicting new materials. As a proof of concept, MatterGen was used to predict the novel compound TaCr2O6, which was subsequently synthesised in a disordered form as Ta1/3Cr2/3O2. However, detailed crystallographic analysis presented in this paper reveals that this is not a novel compound but is identical to the previously reported Ta1/2Cr1/2O2, first described in 1971 and included in MatterGen's training dataset. These findings underscore the necessity of rigorous human verification in AI-assisted materials research, limiting their use for rapid and large-scale prediction of new materials. While generative models hold great promise, their effectiveness is currently limited by unresolved issues with disorder prediction and dataset validation. Improved integration with crystallographic expertise is essential to realise their full potential.



New concepts

High-throughput computational tools and generative artificial intelligence (AI) models aim to revolutionise materials discovery by enabling the rapid prediction of novel inorganic compounds with targeted properties. One recent example of such tools is the MatterGen algorithm, which has been hailed as a major breakthrough in rapid and targeted materials prediction. In this study, it is shown that MatterGen cannot distinguish between its training data and the compounds it predicts. Therefore, human inspection of each predicted compound is needed to ensure novelty, which hinders the rapid materials prediction offered by MatterGen. MatterGen joins other generative AI tools that struggle with key materials science concepts such as crystallography and atomic disorder, and these problems must be solved before the potential from computational tools and generative AI materials prediction can be realised. In this communication, it is highlighted that these issues can cause known compounds to remain undetected when the model check whether predicted structures are novel and appear in its training data set. This highlights ongoing issues with the use of AI for computational materials prediction and the limitations of the training dataset available to the AI models.

Introduction

Materials lie at the heart of modern technology and infrastructure, enabling everything from energy storage to semiconductors and medical devices.1–3 The discovery of new functional materials have traditionally relied on a slow and often labour-intensive process based on chemical intuition, trial-and-error experimentation, and exhaustive parameter space searches.1,3–5 As the demand for novel materials with tailored properties grows, particularly in the context of renewable energy, computing, and sustainability, there is an urgent need to accelerate the materials discovery pipeline.3,5–9

High-throughput computational methods have emerged as a promising approach to this challenge.7,10,11 By combining density functional theory (DFT) calculations, materials databases, and automation, researchers can rapidly screen large numbers of hypothetical compounds.7 However, even these methods are limited by the time and computational resources required to evaluate each candidate.7,12 To overcome these limitations, artificial intelligence (AI) and machine learning (ML) techniques have been introduced, offering the potential to bypass expensive calculations and directly predict materials with desired characteristics.12–14 Yet, these tools are not without their flaws. One of the most persistent challenges is the accurate computational modelling of disorder (in this context: disorder = multiple elements sharing the same crystallographic site), which can lead AI models to misinterpret ordered or low symmetry variations of known materials as entirely new compounds.7,9 This is an artefact of the training data for the AI models, because DFT and other computational tools usually describe disorder by performing the calculations on ordered supercells of the smaller disordered structure.7,9 Given the major promises and high-impact on the potential future directions of materials science delivered by these AI models, it is critical to evaluate their capabilities before being deployed on a large scale.

Among recent advances, Zeni et al.12 have presented MatterGen, a diffusion-based generative model designed to produce stable and chemically diverse inorganic materials. MatterGen is capable of generating crystal structures across the entire periodic table and can be fine-tuned for various inverse design tasks. This represents a significant step forward in computational materials science, offering the possibility of discovering new materials with specific user-defined properties. Earlier AI-based models13,14 suffered from problems with modelling disorder7,9 in materials and Zeni et al.12 explicitly acknowledge the issues associated with disorder and symmetry in AI-generated materials. However, MatterGen is also misled by a predicted ordered material which, in reality, is disordered. As an example of a synthesisable compound from MatterGen, Zeni et al.12 predicted the existence of TaCr2O6 and reported synthesising a disordered version with the overall stoichiometry Ta1/3Cr2/3O2. However, this compound is isostructural with Ta1/2Cr1/2O2 or TaCrO4 which structure was reported in 1972 by Astrov et al.15 Upon closer inspection, Ta1/2Cr1/2O2 provides a good match to the experimental powder X-ray diffraction (PXRD) data used by Zeni et al.12 to support the discovery of the novel compound, Ta1/3Cr2/3O2. This misidentification highlights the ongoing issue of models for material prediction, predicting already known disordered materials as novel ordered compounds.7,9

Results and discussion

In January 2025, Zeni et al.12 revealed MatterGen, a diffusion-based generative model trained on both theoretical and experimental inorganic crystal structures from the Materials Project16 and Alexandria datasets.17,18 Structures predicted by MatterGen were checked for novelty using a reference dataset comprising structures from the Materials Project, the Alexandria datasets, and the inorganic crystal structure database (ICSD), which contains most experimentally reported inorganic crystal structures. To showcase the capabilities of MatterGen the authors predicted a range of new compounds and succesfully synthesised one, TaCr2O6 (Fig. 1A and B), using a solid-state synthesis between Cr metal and Ta2O5.12 Based on PXRD and X-ray Photoelectron Spectroscopy (XPS), this sample was claimed to be a disordered version of TaCr2O6 with the stoichiometry Ta1/3Cr2/3O2 (Fig. 1C and D).
image file: d6mh00268d-f1.tif
Fig. 1 The crystal structures, viewed along a and c of the 3 compounds, Cr2TaO6 (A and B), Cr2/3Ta1/3O2 (C and D) and Ta1/2Cr1/2O2 (E and F). (G) Illustration of the symmetry operation needed to transform Ta1/2Cr1/2O2 into Ta1/3Cr2/3O2 and the resulting structure viewed along the crystallographic c-axis. Cr is shown in blue, Ta in gold, and O in red.

The crystal structures of TaCr2O6 and Ta1/3Cr2/3O2 are shown in Fig. 1. Both compounds have the same tetragonal rutile structure with space group P42/mnm, where the crystallographic c-axis of TaCr2O6 is triple that of Ta1/3Cr2/3O2. Hence, TaCr2O6 can be obtained from the Ta1/3Cr2/3O2 by stacking 3 unit cells along c alongside ordering of the Ta and Cr atoms.

Ta1/3Cr2/3O2 is supposed to be a new compound predicted by MatterGen. In reality, Ta1/3Cr2/3O2 is isostructural with Ta1/2Cr1/2O2 (Fig. 1E and F), which was first reported by Massard et al.19 in 1971 and had its structure solved by Astrov et al.15 in 1972. The two structures have slightly different refined structural parameters, with the structure from Zeni et al.12 being 0.03% larger along a and 0.6% larger along c than the structure reported by Astrov et al.15 which can be explained by minor differences in measurement conditions and compositions. On first inspection, the two structures are visually similar but not identical, as the elongation of the [(Cr/Ta)O6]-octahedra is not pointing in the same direction (Ta1/3Cr2/3O2 left facing (Fig. 1D), Ta1/2Cr1/2O2 right facing (Fig. 1F)). However, upon a 90° rotation around the crystallographic c-axis, the two structures become identical as illustrated in Fig. 1G. Zeni et al.12 placed the O position in the asymmetric unit at (0.197, 0.197, 0.5), whereas in the rotated structure of Astrov et al.15 the O position is at (0.295, 0.705, 0.0). However, these coordinates correspond to the same symmetry-generated site in the P42/mnm space group. By expanding the two structures into a P1 space group, the four oxygen positions generated by the P42/mnm space group can be compared as done in Table 1. The refined O x and y positions differ by 0.008, which is smaller than the uncertainty on the refined positions (see Table S1), which means that within error the oxygens have the same positions in the two structures. Therefore, as shown in Table 1 the O position (0.197, 0.197, 0.5) chosen by Zeni et al.12 corresponds to the generated O position at (0.205, 0.205, 0.5) in the structure setting chosen by Astrov et al.15

Table 1 Comparison between the atomic positions of Ta1/3Cr2/3O2 and Ta1/2Cr1/2O2 (after it has been rotated 90° around the crystallographic c-axis) expanded into P1
  Ta1/3Cr2/3O2 in P1 90° rotated Ta1/2Cr1/2O2 in P1
a = b = 4.63975 Å a = b = 4.63800 Å
c = 3.03639 Å c = 3.01800
Atom x y z Occ x y z Occ
Cr1/Ta1 0 0 0 0.666/0.333 0 0 0 0.5/0.5
Cr2/Ta2 0.5 0.5 0.5 0.666/0.333 0.5 0.5 0.5 0.5/0.5
O1 0.303 0.697 0 1 0.295 0.705 0 1
O2 0.697 0.303 0 1 0.705 0.295 0 1
O3 0.803 0.803 0.5 1 0.795 0.795 0.5 1
O4 0.197 0.197 0.5 1 0.205 0.205 0.5 1


Ta1/2Cr1/2O2, reported in 1972 by Astrov et al.,15 is already listed in the ICSD and the Crystallographic Open Database (COD) under ICSD Collection Code 9516 and COD ID 9011856. ICSD collection code 9516 is listed in the reference data set used by MatterGen as entry 956681.

Furthermore, the training data used for MatterGen contains multiple other Ta Cr oxide structures that are rutile structures expanded to P1. Three examples are shown in Fig. 2 where the exact positions of Ta and Cr are ignored for clarity. Two of the structures are from the Materials project, mp-753467 in Fig. 2A and mp-756340 in Fig. 2B, while the last is from the Alexandria datasets, agm003743920 in Fig. 2C. It is well known that the rutile structure consists of rows of edge-sharing [MO6]-octahedra as shown in Fig. 2D, and each row is corner-sharing its octahedra with the four neighbouring rows. As shown in Fig. 2, this is the case for mp-753467, mp-756340 and agm003743920. These 3 structures all have the composition Ta1/2Cr1/2O2 (listed as TaCrO4 in the training data) and are ordered compounds, with Ta and Cr occupying distinct crystallographic sites in a P1 structure. However, in reality, this is only a result of trying to perform quantum chemical calculations on disordered structures,7,9 and all experimental evidence suggests that Cr and Ta will form a disordered structure.15,19–21


image file: d6mh00268d-f2.tif
Fig. 2 Comparison between the 3 P1 rutile TaCrO4 structures in MatterGen's training dataset (A-C) and a conventional P42/mnm rutile structure (D). O shown in red, the exact distribution of Ta and Cr have been omitted for clarity and the metal sites are shown in blue.

image file: d6mh00268d-f3.tif
Fig. 3 (A) Rietveld refinement of the PXRD pattern from Zeni et al.12 using Ta1/2Cr1/2O2 as starting model. The fitting parameters are shown in Table S1. (B) and (C) shows the resulting structure, Ta0.43Cr0.57O2. Cr is shown in blue, Ta in gold and O in red.

Interestingly, the Materials Project, thus the training dataset for MatterGen and other similar models, lists a rutile Ta Cr oxide (mp-21512 as of 24th of March 2025), Ta2CrO6 or Ta2/3Cr1/3O2, as experimentally verified. However, this is an error because, for compositions with Ta > 0.65 (i.e., Ta0.65Cr0.35O2), the rutile structure undergoes a monoclinic distortion.19 This error is also present in the Cambridge Crystallographic Data Centre's (CCDC) database, which the Materials Project links, but not in the ICSD. The frequency of such errors in the structural databases is not known, but this example underscore the danger of using any database or part of the literature as training data for an AI without first extensively vetting for potential misindexed or mischaracterised structures.22,23

The similarity between the predicted Ta1/3Cr2/3O2 and Ta1/2Cr1/2O2 can be further verified by refining the PXRD pattern published by Zeni et al.12 with Ta1/2Cr1/2O2 and a Cr2O3 impurity. This results in a good fit, as shown in Fig. 3A with the refined structure shown in Fig. 3B and C.

The fit in Fig. 3A achieves a χ2 of 1.508 (compared to 3.596 obtained by Zeni et al.12), an Rwp of 10.295% and RBragg for Ta1/2Cr1/2O2 of 1.622%, showing good agreement between the data and the model. The composition refines to Ta0.43(2)Cr0.57(2)O2, which is lower than the target Ta1/3Cr2/3O2 but is reasonable, as the sample also contains a 10 wt% Cr2O3 impurity. Zeni et al.12 refined the composition to Ta1/3Cr2/3O2, and the discrepancy between the refined compositions reported here and those reported by Zeni et al.12 raise an important question: How is it possible to obtain two different compositions from the same dataset?

There are multiple possibilities, but in this case, the error originates in the unphysical atomic displacement parameters (ADPs) used by Zeni et al.12 In their reported structure (and PCR-file for the Rietveld refinement in Fullprof provided on the MatterGen GitHub repository accessed 24th of March 2025), the refined ADPs are 0 Å2. This means the modelled atoms do not move at all, which is impossible. Because ADPs and occupancies are inversely correlated, the refined occupancies are different. This error is also the reason for the difference in refined weight-% of Cr2O3. Zeni et al.12 only find 2 weight-% Cr2O3, while in the refinement presented here, the Cr2O3 weight-% refines to 10%. Interestingly, for the PXRD data set provided by Zeni et al.,12 the only way to obtain physical ADPs that are neither unphysically large nor unphysically small (even negative), requires correction factors for instrumental or sample-related effects. The best physical fit was obtained using a variable divergence slit correction; however, the details of the instrument used by Zeni et al.12 are not described, so this is not necessarily the correct model. It may be best to remeasure the sample using a well-described and aligned instrument to resolve the issues with intensity corrections. There is, therefore, a possibility that the composition reported here is also wrong. However, to quantify the precision of the refined occupations, a series of refinements with the ADPs fixed at physically reasonable values24,25 was carried out, and the results are shown in Table S2. As can be seen, even if the APD is an order of magnitude lower (0.1 Å2) than the ADP refined for the fit in Fig. 3A, the overall composition refines to Ta0.41(2)Cr0.59(2)O2. The fit quality is also much worse, as the RBragg increases by more than threefold, but this indicates that the refined composition is likely close to the correct value.

Based on the above discussion, it is clear that rutile Ta Cr oxides are common in MatterGens training data and that Ta1/3Cr2/3O2 has not been successfully synthesised by Zeni et al.12 The fundamental question remains if the prediction of a new rutile Ta Cr oxide can be considered a new material. At present, whether the prediction and synthesis of a new point in a solid-solution series constitutes a discovery of a new material is a contentious issue.7,9,26,27

Most certainly, predicting a new composition in a well-known solid solution is the easiest way to predict a material with a composition that has never been synthesised before.7,9,26,27 Therefore, it is possible to predict an essentially infinite number of new compounds within the same solid-solution system, where most are going to be of no scientific or commercial interest.7,9,26,27 Rarely can a classification system cover all possibilities, but generally, predicting a new solid-solution phase can only be considered impactful if it can be argued to fall into one of two cases.7,9,26,27 In one case, the reported or predicted solid-solution should never have been considered before. For example, if the specific elemental combination or the predicted crystal structure has never been reported or predicted to form a solid solution before. A less extreme example would be the prediction or synthesis of a new area of a phase diagram that has not been investigated before. In the other case is the acknowledgement that the exact composition can be critical to synthesising materials with optimised properties. Therefore, predicting the exact composition to target is very valuable, and such prediction or synthesis results can be highly impactful.

In the first case, it has already been established that Ta Cr oxides are known to exist. It is well known that Ta, alongside other early 4d and 5d transition metals, such as Nb, Mo, and W, forms metal oxides with the rutile structure when combined with a 3d transition metal, such as Cr, Fe, Mn, or Ni.15,19,21,28,29 In the Cr–Ta–O system, the rutile structure undergoes a monoclinic distortion to a so-called trirutile structure when the Ta content >0.65, corresponding to Ta0.65Cr0.35O2.19,21 Below this Ta content, the system adopts a rutile structure which persists all the way to CrO2.15,19,21,28–33 While the exact composition of Ta1/3Cr2/3O2 has never been reported, it is just a point in an already established solid-solution series of Ta Cr oxides. Thus, even as a never-reported composition in a solid solution series, the prediction of a new Ta Cr oxide by MattGern can not be considered a “true” new compound.7,9,26,27

On the other hand, Ta1/3Cr2/3O2 could be considered novel because of its properties as Ta1/3Cr2/3O2 was predicted to be a superhard material. Zeni et al.12 measured the bulk modulus to 169[thin space (1/6-em)]GPa. Therefore, Ta1/3Cr2/3O2 could be an optimal composition for a superhard material in the Ta Cr rutile oxide phase diagram. However, Ta1/2Cr1/2O2 is an already known superhard material with a bulk modulus of 181 GPa34 and therefore harder than the compound measured by Zeni et al.12 It can be argued that since Ta1/2Cr1/2O2 is an already known material, MatterGen predicts the second-best, but unknown, superhard material in the Ta–Cr–O phase diagram. However, this argument fails for two reasons. First of all, it is hard to imagine that compositions close to Ta1/2Cr1/2O2, e.g. Ta0.52Cr0.48O2 or Ta0.49Cr0.51O2, will not have hardness very close to that of Ta1/2Cr1/2O2, and MatterGen should likely then have predicted some of these compositions. Secondly, the MatterGen training dataset did not include the hardness of Ta1/2Cr1/2O2, only the material itself. Therefore, if MatterGen could accurately predict the hardest material in the Ta–Cr–O phase diagram, it should at least predict Ta1/2Cr1/2O2 over Ta1/3Cr2/3O2. MatterGen, therefore, appears to fail to correctly account for the compositional effects of hardness in this particular class of materials. Understanding MatterGens’ challenges in predicting hardness and potentially other material properties, however, is beyond the scope of this paper.

In any case, it is clear that no matter the novelty definition used for a new novel solid-solution, Ta1/3Cr2/3O2 fails to be novel. Instead, it represents another point in an already-known solid-solution series, which is prominently featured in MatterGens training data. Furthermore, the actually synthesised compound, Ta0.43(2)Cr0.57(2)O2, turned out to be even closer to an already known material than the predicted compound.

In summary, MatterGen represent a promising step forward for computational materials prediction, but like similar models, it fails to properly account for elements sharing the same crystallographic sites. Furthermore, MatterGen seems unable to fully handle the arbitrary selection of basis sets possible when choosing a crystallographic unit cell, leading to the classification of similar structures as different. This also means that MatterGen struggles with separating novel predictions from materials present in its training dataset, which means caution must be taken when using MatterGen. In this study it was shown that the reported novel TaCr2O6 compound reported by Zeni et al.12 is not novel. It has also been shown that TaCr2O6 was not successfully synthesised and the experimental characterisation work was impacted by the inclusion of ADPs set to 0 Å2. In reality, the real composition is Ta0.43Cr0.57O2. This underscores the critical importance of understanding crystallography for work in materials science, from computational structure prediction to the experimental characterisation of synthesised compounds.

Most critically, ADPs with a value of 0 Å2 in a published paper raises questions about the peer-review process for the paper published by Zeni et al.12 For any publication in any peer-reviewed scientific journal which includes a crystal structure, an absolute essential part of the peer review process is a thorough inspection of the refined parameters. Regardless of the origin of the compound (synthesised by a human or a robot, or even found in nature), a good fit and good agreement factors do not guarantee that the presented crystal structure is correct.35 It is critical to ensure that the refined structure is also chemically and physically plausible. Such errors must be caught during peer review. We cannot lower our criteria for novelty and scientific rigour because the scientific work is (partially) done by an AI.

Besides TaCr2O6, it has not been possible to identify any of the publicly available compounds predicted by MatterGen that are previously reported compounds, except for the disordered iron-based oxide compounds. Zeni et al.12 themselves acknowledge that these are likely ordered versions of disordered rocksalts, and this assessment is likely correct. A straightforward example is shown in Fig. 4, where a version of the predicted ordered compound Mn5Fe3O8 is compared to the disordered rocksalt analogue. The ordered version is simply a tetragonal unit cell created by approximately doubling the lattice parameter of the cubic disordered rocksalt in one direction. This can be further verified by comparing the diffraction patterns of the 3 structures in Fig. 4 as done in Fig. 5. The original Mn5Fe3O8 structure (Fig. 4A) is tetragonal with a = b = 4.4989 Å and c = 8.8392 Å or c = 1.9647a. This slight mismatch between the c and a lattice parameters introduces a set of peaks not observed in the diffraction pattern from a Mn5Fe3O8 Fm[3 with combining macron]m rocksalt (Fig. 4C) with a = b = c = 4.4989 Å. However, changing the tetragonal Mn5Fe3O8 such that c = 2a = 8.9978 Å results in a diffraction pattern identical to the one obtained from a Mn5Fe3O8 Fm[3 with combining macron]m rocksalt (Fig. 4C). Of course, it is impossible to completely rule out the existence of the tetragonal Mn5Fe3O8 predicted by MatterGen with or without ordered Mn and Fe sites. However, our understanding of Mn's and Fe's chemical behaviour would suggest that they will form a solid solution with a disordered rock salt structure.36


image file: d6mh00268d-f4.tif
Fig. 4 (A) A hypothetical structure of a Mn5Fe3O8 disordered rock salt. (B) 2 unit cells of the structure in A. (C) The tetragonal ordered Mn5Fe3O8 predicted by MatterGen. The unit cells are outlined in white. Mn (and Fe if on the same site) is shown in purple, Fe in brown and O in red.

image file: d6mh00268d-f5.tif
Fig. 5 Comparison between the calculated diffraction patterns of the structures in Fig. 4. With a slight offset between the a and c lattice parameters, a tetragonal Mn5Fe3O8 gives a diffraction pattern with additional peaks compared to a Mn5Fe3O8 rock salt structure. However, if c = 2a, the XRD patterns of the tetragonal and cubic structures become identical.

As occupational disorder continues to challenge computational materials science, it is critical that better approaches be developed.37–40 Luckily, AI and other Machine Learning approaches appear to be prime candidates for bringing down the computational cost of modelling disorder and bridging the gap between theory and experiment in materials science.37–40 AI-based materials prediction offers an exciting opportunity to accelerate materials discovery. However, at the moment, the need for careful human inspection of each entry limits their usefulness, especially for generating large numbers of potentially stable compounds.

Conclusion

MatterGen represents a promising advancement in the field of materials discovery. Its ability to generate stable, diverse inorganic structures at a large scale based on their properties is a significant achievement. The open acknowledgement of disorder-related challenges is commendable, as is the authors’ commitment to open science by making every part of MatterGen and their publication12 readily available. However, as this paper shows, the challenges of predicting and recognising disordered crystal structures persist. Zeni et al.,12 who was not contacted before the publication of this paper, used MatterGen to predict the existence of the novel compound TaCr2O6 and claim to synthesise it as a disordered version with the stoichiometry Ta1/3Cr2/3O2. Unfortunately, this compound is not new; instead, it is an ordered, slightly Cr-rich version of the compound Ta1/2Cr1/2O2,15 which is part of the MatterGens training data. As shown in this paper, the synthesised compound is an already known rutile structure with the overall composition of Ta0.43Cr0.57O2. MatterGen also has trouble with predicting some material properties and fails to predict that Ta1/2Cr1/2O2 is actually a harder material than the predicted Ta1/3Cr2/3O2. This result shows that careful human inspection of individual structures and general caution are required for the results provided by MatterGen and similar algorithms, which limits their ability to significantly increase the speed of structure prediction. This paper also highlights the often underappreciated challenge in materials discovery of proper characterisation to determine what has actually been synthesised. This challenge is only growing and becoming increasingly persistent as the complexity of synthesised materials continues to increase.

Studies using computational chemistry methods and AI models to synthesise new materials continue to struggle with disordered materials, which leads to the falsely reported discovery of novel compounds. Therefore, to end on a cheeky note, it might be a good idea to consider the usefulness of integrating AI and other high-throughput methods with the current generation of computational and theoretical materials chemistry methods. Instead, it might be better to first develop methods that do not struggle to the same degree with material prediction before using their outputs for training AI models and similar tools.

Nonetheless, as AI continues to integrate into the materials design workflow, we must balance its speed and generative power with the caution and critical thinking of materials science. Only then can we fully realise the promise of accelerated, intelligent materials discovery.

Methods

Rietveld refinement

The powder X-ray diffraction pattern from the synthesised chromium tantalum oxide synthesised by Zeni et al.12 was obtained from: https://github.com/microsoft/mattergen/tree/main (Accessed 25th of December, 2025).

The Rietveld refinement was carried out using Topas V7,41 by refining a scale factor, zero error, unit cell, atomic displacement parameters (ADPs), atomic positions, and profile parameters for each phase. For Cr2O3, only the Cr atomic positions were refined, while for Ta1/2Cr1/2O2, the O atomic positions were refined, and Cr and Ta occupancies likewise. The sum of the Ta and Cr occupancies was assumed to be 1, while the ADPs for all Cr and Ta atoms were constrained to the same value. Similarly, the ADPs of all O atoms were constrained to be the same. The peak profile was modelled using the Thompson–Cox–Hasting pseudo-Voigt peak shape, where U, V, W, and X were refined and constrained to be the same for both phases. The peak asymmetry was modelled using the simple axial model macro in Topas. The background was described using a 10th-degree Chebyshev polynomial. The sample or instrument-related intensity corrections were corrected using the variable divergence slit intensity correction macro in Topas. The monochromator angle was assumed to be 0 as the diffraction pattern contained both Kα1 and Kα2 Bragg peaks.

Conflicts of interest

There are no competing interests to declare.

Data availability

The data in this communication are a reanalysis of the data published by Zeni et al. in Nature, 2025, 639, 624–632. According to Zeni et al. the data is available at https://github.com/microsoft/mattergen. Structures obtained from the Materials Project and the Inorganic Crystal Structure Database are available at https://next-gen.materialsproject.org/ and https://icsd.fiz-karlsruhe.de/index.xhtml, respectively.

Supplementary information (SI) is available. See DOI: https://doi.org/10.1039/d6mh00268d.

Acknowledgements

I would like to acknowledge the Carlsberg Foundation grant CF22-0367 and the Alexander von Humboldt Foundation, through an Alexander von Huboldt Fellowship, for funding my research positions.

References

  1. J. Yeo, G. S. Jung, F. J. Martín-Martínez, S. Ling, G. X. Gu, Z. Qin and M. J. Buehler, Phys. Scr., 2018, 93, 053003 CrossRef PubMed.
  2. S. V. Kalinin, B. G. Sumpter and R. K. Archibald, Nat. Mater., 2015, 14, 973–980 CrossRef CAS PubMed.
  3. M. Jansen, Adv. Mater., 2015, 27, 3229–3242 CrossRef CAS PubMed.
  4. M. Jansen, Angew. Chem., Int. Ed., 2002, 41, 3746–3766 CrossRef CAS PubMed.
  5. L. Soderholm and J. F. Mitchell, APL Mater., 2016, 4, 053212 CrossRef.
  6. A. S. Anker, J. H. Jensen, M. González-Duque, R. Moreno, A. Smolska, M. Juelsholt, V. Hardion, M. R. V. Jørgensen, A. Faíña, J. Quinson, K. Støy and T. Vegge, ACS Nano, 2026, 20, 6767–6782 CrossRef CAS PubMed.
  7. A. K. Cheetham and R. Seshadri, Chem. Mater., 2024, 36, 3490–3495 CrossRef CAS PubMed.
  8. F. T. Szczypiński, S. Bennett and K. E. Jelfs, Chem. Sci., 2021, 12, 830–840 RSC.
  9. J. Leeman, Y. Liu, J. Stiles, S. B. Lee, P. Bhatt, L. M. Schoop and R. G. Palgrave, PRX Energy, 2024, 3, 11002 CrossRef.
  10. S. Kim, J. Noh, G. H. Gu, A. Aspuru-Guzik and Y. Jung, ACS Cent. Sci., 2020, 6, 1412–1420 CrossRef CAS PubMed.
  11. A. Jain, Y. Shin and K. A. Persson, Nat. Rev. Mater., 2016, 1, 15004 CrossRef CAS.
  12. C. Zeni, R. Pinsler, D. Zügner, A. Fowler, M. Horton, X. Fu, Z. Wang, A. Shysheya, J. Crabbé, S. Ueda, R. Sordillo, L. Sun, J. Smith, B. Nguyen, H. Schulz, S. Lewis, C.-W. Huang, Z. Lu, Y. Zhou, H. Yang, H. Hao, J. Li, C. Yang, W. Li, R. Tomioka and T. Xie, Nature, 2025, 639, 624–632 CrossRef CAS PubMed.
  13. N. J. Szymanski, B. Rendy, Y. Fei, R. E. Kumar, T. He, D. Milsted, M. J. McDermott, M. Gallant, E. D. Cubuk, A. Merchant, H. Kim, A. Jain, C. J. Bartel, K. Persson, Y. Zeng and G. Ceder, Nature, 2023, 624, 86–91 CrossRef CAS PubMed.
  14. A. Merchant, S. Batzner, S. S. Schoenholz, M. Aykol, G. Cheon and E. D. Cubuk, Nature, 2023, 624, 80–85 CrossRef CAS PubMed.
  15. D. N. Astrov, N. A. Kryukova, R. B. Zorin, V. A. Makarov, R. P. Ozerov, F. A. Rozhdestvenskii, V. P. Smirnov, A. M. Turchaninov and N. V. Fadeeva, Sov. Phys. Crystallogr., 1972, 17, 1017–1023 Search PubMed.
  16. A. Jain, S. P. Ong, G. Hautier, W. Chen, W. D. Richards, S. Dacek, S. Cholia, D. Gunter, D. Skinner, G. Ceder and K. A. Persson, APL Mater., 2013, 1, 011002 CrossRef.
  17. J. Schmidt, N. Hoffmann, H.-C. Wang, P. Borlido, P. J. M. A. Carriço, T. F. T. Cerqueira, S. Botti and M. A. L. Marques, Adv. Mater., 2023, 35, 2210788 CrossRef CAS PubMed.
  18. J. Schmidt, H.-C. Wang, T. F. T. Cerqueira, S. Botti and M. A. L. Marques, Sci. Data, 2022, 9, 64 CrossRef CAS PubMed.
  19. P. Massard, J. C. Bernier and A. Michel, J. Solid State Chem., 1972, 4, 269–274 CrossRef CAS.
  20. A. N. Christensen, T. Johanssen and B. Lebech, J. Phys. C-Solid State Phys., 1976, 9, 2601 CrossRef CAS.
  21. R. Mani, S. N. Achary, K. R. Chakraborty, S. K. Deshpande, J. E. Joy, A. Nag, J. Gopalakrishnan and A. K. Tyagi, J. Solid State Chem., 2010, 183, 1380–1387 CrossRef CAS.
  22. D. E. Widdowson and V. A. Kurlin, arXiv, 2025, preprint, arXiv:2509.15088 DOI:10.48550/arXiv.2509.15088.
  23. A. Wlodawer, Z. Dauter, P. Rubach, W. Minor, M. Jaskolski, Z. Jiang, W. Jeffcott, O. Anosova and V. Kurlin, Acta Crystallogr., Sect. D: Biol. Crystallogr., 2025, 81, 170–180 CrossRef CAS PubMed.
  24. H. X. Gao, L.-M. Peng and J. M. Zuo, Acta Crystallogr., Sect. A, 1999, 55, 1014–1025 CrossRef PubMed.
  25. M. J. Cooper and R. I. Taylor, Acta Crystallogr., Sect. A, 1969, 25, 714–715 CrossRef CAS.
  26. D. S. Chawla, Chem. Eng. News, 2026 Search PubMed , https://cen.acs.org/research-integrity/Nature-robot-chemist-paper-corrected/104/web/2026/01 (Accessed 14/04/2026).
  27. K. S. Jakob, A. Walsh, K. Reuter and J. T. Margraf, Adv. Mater., 2026, 38, e14226 CrossRef CAS PubMed.
  28. W. H. Baur, Crystallogr. Rev., 2007, 13, 65–113 CrossRef CAS.
  29. H. K. Müller-Buschbaum and R. Wichmann, Z. Anorg. Allg. Chem., 1986, 536, 15–23 CrossRef.
  30. S. Schellert, M. Weber, H. J. Christ, C. Wiktor, B. Butz, M. C. Galetz, S. Laube, A. Kauffmann, M. Heilmaier and B. Gorr, Corros. Sci., 2023, 211, 110885 CrossRef CAS.
  31. S. Schellert, B. Gorr, S. Laube, A. Kauffmann, M. Heilmaier and H. J. Christ, Corros. Sci., 2021, 192, 109861 CrossRef CAS.
  32. W. H. Cloud, D. S. Schreiber and K. R. Babcock, J. Appl. Phys., 1962, 33, 1193–1194 CrossRef CAS.
  33. O. Glemser, U. Hauschild and F. Trüpel, Z. Anorg. Allg. Chem., 1954, 277, 113–126 CrossRef CAS.
  34. S. Zhang, X. Wang, C. Zhang, H. Xiang, Y. Li, C. Fang, M. Li, H. Wang and Y. Zhou, J. Adv. Ceram., 2024, 13, 373–387 CrossRef CAS.
  35. B. H. Toby, Powder Diffr., 2006, 21, 67–70 CrossRef CAS.
  36. D. A. O. Hope, A. K. Cheetham and G. J. Long, Inorg. Chem., 1982, 21, 2804–2809 CrossRef CAS.
  37. H. Metni, L. Ruple, L. N. Walters, L. Torresi, J. Teufel, H. Schopmans, J. Östreicher, Y. Zhang, M. Neubert, Y. Koide, K. Steiner, P. Link, L. Bär, M. Petrova, G. Ceder and P. Friederich, Adv. Mater., 2026, e23620 CrossRef CAS PubMed.
  38. L. An, H. Ma, J. Liu, W. Guo and X. Wen, NPJ Comput. Mater., 2025, 11, 226 CrossRef CAS.
  39. K. S. Jakob, A. Walsh, K. Reuter and J. T. Margraf, Adv. Mater., 2026, 38, e14226 CrossRef CAS PubMed.
  40. K. S. Jakob, K. Reuter and J. T. Margraf, Adv. Intell. Discovery, 2025, 202500031 CrossRef.
  41. A. A. Coelho, J. Appl. Crystallogr., 2018, 51, 210–218 CrossRef CAS.

This journal is © The Royal Society of Chemistry 2026
Click here to see how this site uses Cookies. View our privacy policy here.