Wenhao Sun†
* and
Nicholas David†
Department of Materials Science and Engineering, University of Michigan, Ann Arbor, MI, USA. E-mail: whsun@umich.edu
First published on 13th August 2024
Synthesis of predicted materials is the key and final step needed to realize a vision of computationally accelerated materials discovery. Because so many materials have been previously synthesized, one would anticipate that text-mining synthesis recipes from the literature would yield a valuable dataset to train machine-learning models that can predict synthesis recipes for new materials. Between 2016 and 2019, the corresponding author (Wenhao Sun) participated in efforts to text-mine 31782 solid-state synthesis recipes and 35675 solution-based synthesis recipes from the literature. Here, we characterize these datasets and show that they do not satisfy the “4 Vs” of data-science—that is: volume, variety, veracity and velocity. For this reason, we believe that machine-learned regression or classification models built from these datasets will have limited utility in guiding the predictive synthesis of novel materials. On the other hand, these large datasets provided an opportunity to identify anomalous synthesis recipes—which in fact did inspire new hypotheses on how materials form, which we later validated by experiment. Our case study here urges a re-evaluation on how to extract the most value from large historical materials-science datasets.
Predictive retrosynthesis has been a long-standing goal of organic chemistry,13–16 with notable recent advances made using deep neural networks.17,18 These machine-learning successes were enabled by the availability of large commercial databases of organic reactions, such as SciFinder19 and Reaxys.20 Similar commercial databases for inorganic materials synthesis reactions do not currently exist. However, because there have been thousands of successful materials synthesis reports in the literature, text-mining synthesis recipes from published papers could provide a vast source of expert knowledge to train machine-learning models for predictive inorganic materials synthesis.
Between 2016 and 2019, I‡ was a postdoctoral fellow in Gerbrand Ceder's research group at Lawrence Berkeley National Laboratory and participated in the text-mining of 31782 solid-state synthesis recipes21 and 35675 solution-based synthesis recipes22 from the literature. Here, I offer a retrospective account on attempts to build machine-learning (ML) models for predictive materials synthesis from this dataset. Incidentally, this story follows the Gartner ‘hype cycle’,23 which proceeds via (1) technology trigger, (2) peak of inflated expectations, (3) valley of disillusionment, (4) slope of enlightenment, and (5) plateau of productivity. The perspectives here are my own, and are not necessarily shared by my co-authors from the text-mining publications.
Here, we begin by reviewing the natural language processing strategies used to build the text-mined recipe database. Then, we evaluate the dataset against the “4 Vs” of data science, and show that the dataset suffers limitations in volume, variety, veracity, and velocity. While some of these limitations stem from technical issues in text-mining, we argue that these limitations primarily arise from the social, cultural, and anthropogenic biases in how chemists have explored and synthesized materials in the past.24 We show that machine-learning models trained on this text-mined dataset are successful in capturing how chemists think about materials synthesis, but do not offer substantially new guiding insights on how to best synthesize a novel material.
On the other hand, we found that the most interesting recipes in this dataset are actually the anomalous recipes—the ones that defy conventional intuition in solid-state synthesis. These anomalous recipes are also relatively rare, meaning they would not significantly influence regression or classification models. By manually examining some anomalous recipes, we arrived at a new mechanistic hypothesis on how solid-state reactions proceed, and how to select precursors that enhance the reaction kinetics and selectivity of target materials. This hypothesis drove a series of high-visibility follow-up studies,25–28 which experimentally validated our hypothesized mechanism, gleaned from the text-mined literature dataset.
As natural language processing algorithms continue to improve, many other text-mining studies are emerging in materials science, with domains ranging from nanomaterials29 to alloys,30,31 catalysts,32,33 and more.34–36 Although our retrospective is focused on materials synthesis, we believe that our case study here offers broad and general lessons on how to best leverage large historical datasets for data-driven chemical discovery and design.
Instead of enumerating all possible representations of target and precursor materials, in He et al.38 we replaced all chemical compounds with <MAT>, and used sentence context clues to label target, precursors, or other (such as atmospheres, reaction media, etc.). For example, from the sentence “a spinel-type cathode material <MAT> was prepared from high-purity precursors <MAT>, <MAT> and <MAT>, at 700 °C for 24 h in <MAT>”, it is immediately apparent that the first <MAT> represents a target, the next three are precursors, and the final one corresponds to reaction media. We used a bi-direction long short-term memory neural network with a conditional random field layer (BiLSTM-CRF) to identify these sentence context clues. To train the BiLSTM-CRF, we manually annotated targets, precursors and other reaction media in 834 solid-state synthesis paragraphs.
We classified sentence tokens into 6 categories: mixing, heating, drying, shaping, quenching, or not operation, corresponding to the main operations in solid-state synthesis. We manually assigned token labels for an annotated set consisting of 100 solid-state synthesis paragraphs (664 sentences). After doing so, for each type of operation, we were able to associate and extract the relevant parameter values (or range of values); for example, the times, temperatures, and atmospheres associated with heating. A Markov chain representation of the experimental operations then was able to reconstruct a flowchart of the synthesis procedures.
Altogether, we scraped a total of 4204170 papers, which contained 6218136 paragraphs in the experimental sections. After classification, 188198 paragraphs were found to describe inorganic synthesis, such as solid-state, hydrothermal, sol–gel, and co-precipitation syntheses, with 53538 corresponding to solid-state synthesis. The overall extraction yield of the pipeline is 28%, meaning that out of 53538 solid-state paragraphs, only 15144 of them produce a balanced chemical reaction. As a test of the full extraction pipeline, 100 paragraphs were randomly pulled from the set of paragraphs classified as solid-state synthesis and checked for completeness of the extracted data. Out of the 100 paragraphs, we found 30 that did not contain a complete set of starting materials and final products, meaning that even a human expert would not be able to reconstruct a reaction from these paragraphs. To build the final database, we prioritized providing accurate recipes, so if a paragraph fails the extraction pipeline, we elected to exclude it from the dataset rather than include an incomplete recipe. The final dataset contained 19488 paragraphs, with 13009 unique targets, 1845 unique precursors and 16290 unique reactions. In 2020, the dataset was expanded to 31782 chemical reactions retrieved from 95283 solid-state synthesis paragraphs.
Volume refers to the number of datapoints in a dataset. Although each dataset seemingly contains over 30000 entries, many of these entries are redundant in their target chemical systems, meaning that there are fewer unique targets than anticipated.
Variety refers to the diversity of data. Here, we evaluate variety in two ways: first over chemical space, where we show that there is often limited coverage of text-mined recipes, even in materials systems that are experimentally known and well-represented in the Materials Project. Second, we show that even for target materials with numerous recipes, there is little variety in the reported reaction parameters or precursors.
Veracity is a measure of the quality and reliability of the data. We show that many synthesis paragraphs are missing essential information, even for a human chemist to synthesize the target material. Although there are some technical issues, broadly speaking there are many ‘unknown unknowns’ in materials synthesis that, if unpublished, can confound machine-learned synthesis models.
Velocity describes the speed at which data can be generated, collected and processed. Text-mining scientific literature requires data labeling and curation efforts by domain experts, which is a laborious and time-consuming process. Recently developed large language models like GPT-4 may accelerate this process, but human intervention is still necessary to prevent processing errors or potential hallucinations.
In Fig. 1, we plot the cumulative distribution of the number of recipes versus unique chemical systems, revealing a shape reminiscent of the ‘80/20’ Pareto principle—which states that 20% of the population represents 80% of a property distribution.45 In the solution-based synthesis dataset, the first 7% (170 systems) of most common systems for solution-based synthesis accounts for 80% of the entire dataset, whereas the first 36% (1765 systems) of most common systems for solid-state synthesis composes 80% of that dataset. In fact, the first 2 and first 42 most common chemical systems make up 20% of the solution-based and solid-state datasets, respectively. These highly represented systems, listed beneath the curves in Fig. 1, are all oxides, specifically battery and catalyst materials. For the purposes of chemical discovery, redundant recipes in the same chemical space correspond to reduced coverage of diverse chemical systems to train generalizable machine-learning algorithms.
Fig. 1 A Pareto-principle distribution of materials recipes versus chemical systems appears in both the solid-state and solution-based text-mined synthesis recipes. |
In characterizing the chemical variety of the dataset, we find that oxides comprise 91.5% of the text-mined recipes. For ternary metal oxides, there are 785 M1–M2–O systems with known ternary oxides, of which text-mined synthesis recipes exist for 58% of these systems. Notable gaps in the text-mined recipe (TMR) dataset include ternary oxides containing precious metals, such as Au, Pd, Pt, Ir, Os, Ru and Re, as well as alkali metal oxides with Cs or Rb cations. Precious metal delafossites, such as PtCoO2, are emerging as an exciting materials design space as their ultrahigh electrical conductivity exceeds even that of metallic Au.46 However, synthesis recipes for these compounds may not be reliable if predicted from machine-learning algorithms trained on this TMR dataset. We note that one of the more reliable single-crystal growth recipes for PtCoO2 proceeds via the metathesis reaction LiCoO2 + PtCl2 → PtCoO2 + LiCl,47 where the LiCl byproduct is later removed with water. Such metathesis reactions are very effective when thermodynamic driving forces from standard oxide precursors are small.48,49 However, metathesis reactions do not appear anywhere in the text-mined dataset. Other clever reactions that are facilitated by extrinsic chemical species50 may also be unaccounted for by the reaction balancer.
Ternary sulfides are well-studied, with 532 experimentally known M1–M2–S systems, but they are underrepresented in the text-mined dataset, with only 72 spaces (13%) containing recipes, as visualized on the heatmap in Fig. 2 and summarized in Table 1. It is not immediately clear why the coverage of sulfides in the text-mined dataset is so poor. This is especially unfortunate as sulfides are an important design space for Li-ion-battery solid-state electrolytes,54,55 thermoelectrics,56,57 and solar cells.58 Moreover, experimental validation of DFT-predicted sulfides can often be problematic. For example, Narayan et al. attempted to synthesize 24 new ternary sulfides and selenides, where 14 of them were predicted to be convex-hull stable, yet none of these ternary sulfides formed in experiment.59 A machine-learned synthesis predictor in spaces where convex-hull stability predictors are weak would have been especially beneficial to have, but unfortunately the ternary-sulfide training set is not well-sampled by the text-mined recipe dataset.
Fig. 2 Heatmap showing the coverage of text-mined sulfides compared to all known ternary metal sulfides. Green squares indicate known materials with text-mined recipes, blue squares indicate systems that are absent from the text-mined recipes dataset but have entries on the Materials Project51 (and therefore have calculated energetics), and red squares indicate systems missing in the text-mined dataset and the Materials Project but present in the Pauling Files, hosted digitally by the Materials Platform for Data Science (MPDS).52 Ternary spaces with no entries in any database are colored white. Chemical spaces are clustered hierarchically53 to elucidate chemical trends in this space. |
Material system | Total coverage (% possible) | In TMR, and MP or MPDS (green) | Not in TMR, but in MP (blue) | Known but not in TMR or MP (red) |
---|---|---|---|---|
M1–M2–O | 785 (76%) | 454 (58%) | 198 (25%) | 133 (17%) |
M1–M2–S | 532 (51%) | 72 (13%) | 317 (60%) | 143 (27%) |
M1–M2–N | 296 (29%) | 19 (6%) | 174 (59%) | 103 (35%) |
A similar lack of text-mined recipes exists for the ternary metal nitrides, likely because much of the chemical exploration in this space was conducted prior to the year 2000,2,60 where there is no coverage in the text-mined recipe database. Of course, one could supplement the TMR dataset with more recipes of ternary sulfides and nitrides, and other missing chemical spaces. Our point is to highlight that if one did not scrutinize the dataset and used predicted recipes for chemical systems where there are gaps in this dataset, the predictions would likely be unreliable.
In the text-mined recipe dataset, there are 237 entries in the Y–Ba–Cu–O-* system. Fig. 3a plots the temperature–time distributions of these recipes, where the average reaction temperature is 950 ± 140 °C, and the average reaction time is 54 ± 81 h. In publications since the year 2000 (where our text-mined recipe dataset begins), the scatter in reaction temperatures and times is relatively uniform—there has not been a convergence on optimal reaction times or temperatures. Fig. 3b plots the most common Ba source and shows that 80% of recipes use BaCO3 as the barium precursor.
In 2001, a short 1-page publication in the Journal of Chemical Education, ‘Superconductor synthesis—an improvement’, claimed that replacement of BaCO3 with a BaO2 precursor could reduce an undergraduate YBCO synthesis lab from 12 hours with regrinding and reannealing to 4 hours in one step.62 (This recipe was not in the text-mined dataset). Inspired by this article, in Miura et al.26 we used in situ synchrotron X-ray diffraction and transmission electron microscopy to observe that a BaO2 precursor in fact yields YBCO in 25 minutes, and that the reaction occurs as fast as the sample can heat (in our case 30 °C min−1). The YBCO product was indeed found to be superconducting. In separate work, this fast YBCO reaction was attributed to the formation of a transient liquid phase,63 and is now being exploited to accelerate the manufacturing of YBCO superconductors.64
As illustrated on Fig. 3a, BaCO3 decomposes around 1360 °C,65 whereas BaO2 decomposes at 550 °C,66 and the target compound YBCO decomposes above 1100 °C.67 It is therefore quite surprising that 80% of literature recipes choose to start from BaCO3 instead of BaO2, which requires long reaction times and laborious regrinds and reanneals. In discussions with solid-state chemists, we learned that BaO2 is not very air-stable, so a small fraction (∼5%) will transform to BaCO3 over time, making it difficult to weigh samples accurately and properly dose the Ba-stoichiometry. However, the trade-off for using BaO2 instead of a BaCO3 precursor is the opportunity for a far more efficient solid-state synthesis reaction. In personal communications,68 we learned that the Cava group at Princeton has long-used Ba(NO3)2 as a precursor to cuprate superconductors instead of BaCO3, even though this synthesis ‘trick’ does not seem to be adopted by the solid-state chemistry community at large. This YBCO example illustrates how chemists rely primarily on published recipes by previous chemists, rather than exploring synthesis parameter space for more optimal recipes.
From a technical perspective, both solid-state and solution text-mined datasets contain missing values. 1090 recipes from the solid-state dataset are either missing operations or only contain one uninformative operation type “StartingSynthesis” with no additional information. While the quality of the recipes at the chemistry level (only considering compositions) is high, with a reported 93% F1-score, further investigation of chemical formulae reveals some missing data. Many recipes represent ‘condensed recipes’, where chemical substitutions (indicated by a placeholder, e.g., ‘M’ for metal) and variable stoichiometries (indicated by a subscript, e.g., ‘x’, ‘y’, or ‘z’) are left unspecified. Roughly 1 in 4 chemical compositions with variable stoichiometries are left unspecified in the solid-state dataset. When considering all attributes of a synthesis recipe, the F1-score drops to 51%.
Reproducibility is also a broader issue in inorganic materials synthesis.69 Even human chemists often encounter difficulty when reproducing the synthesis of published compounds. Anecdotally, many important aspects of a reaction are often not reported or explained in a published paper. In informal conversations with chemists, we have learned that oxynitrides can be easier to synthesize in the winter than in the summer;70 that metal vs. plastic reaction containers can change the polymorph of CaCO3 precipitated from solution,71 or that the grinding patterns in a mortar and pestle can influence the performance of synthesized battery materials.72 From a machine-learning perspective, these unreported aspects of materials synthesis represent ‘unknown unknowns,’ which cannot even be expected to be reproduced by human chemists, let alone be captured by text-mining or machine-learning models.73
It would benefit chemistry if there were a cultural shift in how synthesis protocols are reported in publications. Instead of only publishing the final successful synthesis recipe, it should also be encouraged to briefly discuss attempted reactions that were sensible but unsuccessful. This could help other chemists avoid pitfalls associated with a tricky reaction. Additionally, if a chemist is aware that environmental considerations or reaction setups were important for successful synthesis of their reported material—such as laboratory humidity and oxygen partial pressure, crucible material, precursors and their associated impurities,74 mortar-and-pestle grinding technique, etc.—these details should definitely be mentioned.
Our text-mining work was performed before the advent of large language models (LLMs) like GPT, which now dominate natural language processing methodology.39,40 With LLMs, it should be possible to parse and process published synthesis recipes with fewer manual annotation efforts. The latest OpenAI model available for fine-tuning ‘gpt-3.5-turbo’ costs $8.00 per 1 M input tokens. Considering the 53538 synthesis paragraphs and their abstracts (roughly 400 tokens) it would cost roughly $170 per epoch to train – a small and manageable cost for research groups. However, acquiring the relevant papers to text-mine still requires either agreements with publishers or some upfront time to download and prepare inputs for LLMs. Domain expertise is also needed to confirm the fidelity of LLM-extracted recipes, and to check against hallucinations. However, while LLMs could improve velocity and dataset volume, it still would not enhance the variety of published materials or recipes, which again, are confined to narrow domains of chemical and synthesis parameter space due to anthropogenic biases.
Using XGBoost algorithms, Huo et al.79 predicted reaction temperatures and times for carbonate and non-carbonate reactions. Predicted reaction temperatures fell within ±100 °C of the reported synthesis temperature in half of the predictions, and ±400 °C within 1.5× the interquartile distribution. The models achieved R2 ∼ 0.5–0.6 for heating temperature predictions and R2 ∼ 0.3 for log10(reaction times). Karpovich et al. trained a conditional variational autoencoder (CVAE) to predict reaction temperatures and times,80 and obtained similar R2 values to Huo et al. During feature analysis, the average precursor melting point was found to contribute most to reaction temperature predictions. Huo et al. noted that this fact is reminiscent of Tamman's Rule, which is a common empirical heuristic that solid-state reactions should be conducted above 1/3 to 1/2 of the precursor melting points.
To predict starting precursors, He et al.81 developed “PrecursorSelector”, an algorithm trained to predict the solid-state precursors of quaternary oxides using an encoding scheme for chemical similarity to previously synthesized compounds in the text-mined recipes. For a diversity of high-component oxide materials, including mixed-anion materials, they predict starting precursors that match literature reports with 82% accuracy when up to 5 precursor sets are included. Importantly, the encoding obtains experimentally reported precursors in fewer predictions than a baseline of simply choosing the most common oxide precursors. Using a different literature dataset, E. Kim et al.82 predicted precursor materials using a conditional variational autoencoder (CVAE) trained directly from paragraphs, without any explicit domain knowledge. Given the target material InWO3, which was not included in the training set, their model predicts the following precursor sets for both solution and solid-state reactions: (1) In2S3 + WCl4, (2) In(NO3)3 + WCl4, (3) In2O3 + WO2, (4) In2O3 + WN, and (5) InCl3 + Na2WO4. Reactions 3 and 4 are indeed thermodynamically spontaneous, and the fifth set of precursors was previously used for solution-based synthesis.83
In many ways, the machine-learning models outperform the baseline synthesis prediction, and are successful at predicting solid-state reaction temperatures, times, and precursors in alignment with how experimental chemists have previously synthesized inorganic materials.
However, because the models largely capture how chemists think about materials synthesis, it is arguable how much additional value these predictions bring to the experimental chemist.84,85 The prediction of reaction temperatures in alignment with Tamman's Rule might be because Tamman's Rule drove the chemist to try those reaction temperatures in the first place—and not that these reaction temperatures are necessarily the most optimal. Likewise, reaction times may not be so meaningful. Chemists often leave reactions in the oven for 24 hours or overnight, meaning a regression on reaction times may be capturing this factor of human convenience, rather than some fundamental mechanism of materials synthesis. Finally, it is not necessarily erroneous to predict precursors that lack an entry in the text-mined dataset, and a precursor that matches the literature dataset does not guarantee that it is an optimal precursor. As exemplified in the YBCO example (Section 3.3), a machine-learned synthesis predictor is unlikely to suggest the BaO2 precursor given its rarity in the training dataset, even though it is a far superior precursor than BaCO3 for YBCO synthesis.
Overall, our analysis here supports the claims of Jia et al.,24 who argued that anthropogenic biases have narrowly confined exploration of reaction conditions to those similar to previous conditions. Moreover, in the hydrothermal synthesis of amine-templated metal oxides, Jia et al. found that neither the popularity of reactants, nor the choices of reaction conditions, were correlated to the success of the reaction, and that machine-learning models trained on randomized reaction datasets outperformed models trained on larger human-selected reaction datasets. Because anthropogenic biases are largely reflected in the 4 Vs of the text-mined dataset, these historical biases strongly limit the creativity and sophistication of downstream machine-learned synthesis recipe predictions.
Therefore, if a published recipe reports common precursors and round values for reaction times and temperatures (6, 12 or 24 hours; at 700, 800, 900 or 1000 °C), this means that a chemist probably tried some simple initial experiments and the target phase formed easily. On the other hand, if a final reported recipe is complicated, for example using unusual precursors, laborious precursor mixing steps, or precise reaction temperatures (such as 835 °C) or times (3.75 hours), the chemist probably had to refine the synthesis parameters through a laborious trial-and-error optimization process.
Following this hunch, I classified the text-mined dataset by ‘simple’ or ‘complex’ recipes, as illustrated in Fig. 4. Most recipes for quaternary oxides report the three simple binary oxides as precursors. However, a recurrent observation emerged, where ternary oxides were being reported as precursors. Because we had the DOIs for each recipe, I could study the original papers. The following discussion on the synthesis of Sr2FeMoO6 provided fascinating and illuminating insights:
Fig. 4 A flow chart for identifying ‘simple’ text-mined synthesis reactions and inferring the more interesting ‘complex’ reactions from them. |
“Sr2FeMoO6 samples were prepared by solid-state reaction. Two elaboration processes have been used in order to improve the purity of the samples. For the first one, which is close to the protocol used by most of groups, stoichiometric amounts of SrCO3, Fe2O3 and MoO3 were mixed, ground and calcined at 900 °C for 2 h in an Ar atmosphere. The calcined mixtures were reground, pressed and reduced for 1 h under current flow of 5% H2/95% Ar at 700 °C. Afterwards the mixtures were sintered at 1200 °C under argon flow during 10 h.
Unfortunately, the last protocol does not allow one to obtain a pure Sr2FeMoO6 compound. Instead, SrMoO4 is thermodynamically favored. Therefore, a segregation occurs which makes it impossible to obtain a pure phase.
To get rid of this difficulty, we have developed a sintering process in which only one reaction is performed at each step in order to avoid the formation of SrMoO4. Therefore, in the first step, stoichiometric amounts of SrCO3, Fe2O3 were mixed, ground and calcined at 1000 °C during 5 h under an Ar flow giving rise to Sr2FeO3.5 compound. Then stoichiometric amounts of Sr2FeO3.5, MoO2 and MoO3 were mixed, ground, pressed and sintered at 1200 °C during 2 h under N2/H2 flow.”86
Although this report was purely phenomenological, we could use Materials Project energies to interpret the observation. First, we found that the SrO–MoO3–Fe2O3 pseudo-ternary convex hull (Fig. 5a) is skewed—the SrMoO4 phase is much deeper along the SrO–MoO3 binary hull than Sr2FeO4 is along the SrO–Fe2O3 binary hull. This is captured by the following reaction energies, which are easily assessed using the MaterialsProject reaction calculator:
SrO + MoO3 → SrMoO4, ΔH = −212 kJ per Sr |
2SrO + ½Fe2O3 + ¼O2 → Sr2FeO4−δ, ΔH = −53 kJ per Sr |
Three precursors can only meet in space at a single point. Therefore, reactions probably do not proceed between all three precursors reacting at once. Instead, it is more likely that solid-state reactions initiate at the interfaces between only two precursors at a time. The reaction SrO + MoO3 is much more favorable than SrO + Fe2O3, meaning the large reaction driving force can promote fast reaction kinetics to form the low-energy SrMoO4. Formation of SrMoO4 consumes 93% of the total reaction energy, leaving only ΔH = −16 kJ per Sr for SrMoO4 to react with Fe2O3 to form the target phase, Sr2FeMoO6. On the other hand, by separately synthesizing Sr2FeO3.5, only 25% of the total reaction energy is consumed. This retains 75% of the reaction driving force for the second step of the reaction, where MoOx is added, which enables phase-pure formation of Sr2FeMoO6 in only 2 hours.
Based on this insight, a synthesis strategy for quaternary oxides should be to first synthesize a high-energy ternary oxide intermediate, like Sr2FeO3.5, and then add the final metal oxide. We examined the text-mined recipe database and found 20 more examples where chemists used a ternary oxide precursor that was not initially favored by the initial pairwise reaction between three precursors, listed in Table 2. This suggests that other chemists may have independently come up with a similar synthesis strategy during their reaction optimization. This discussion associated with reaction design is usually buried deep in the ‘results’ section of a paper, which was not extracted in our text-mining algorithms. However, once we knew what patterns to look for, our text-mined dataset provided a valuable data source to find more historical examples of this strategy.
Target | Reported precursors | Unusual ternary oxide precursor | Deepest hull ternary oxide | Reference |
---|---|---|---|---|
Sr2FeMoO6 | SrCO3, Fe2O3, MoO2, Sr2FeO3.5 | Sr2FeO3.5 | SrMoO4 | 86 |
LiCr(MoO4)2 | Li2MoO4, Cr(NO3)3·9H2O, MoO3 | Li2MoO4 | Li2CrO4 | 87 |
Li3Cr(MoO4)3 | Li2MoO4, Cr(NO3)3·9H2O, MoO3 | Li2MoO4 | Li2CrO4 | 87 |
Li2MnSiO4 | Mn(CH3COO)2·4H2O, Li2SiO3 | Li2SiO3 | Li2MnO3 | 88 |
TlNd(MoO4)2 | Tl2MoO4, Nd2(MoO4)3 | Tl2MoO4 | Nd2(MoO4)3 | 89 |
TlPr(MoO4)2 | Pr6O11, MoO3, Tl2MoO3 | Tl2MoO3 | Pr2Mo4O15 | 89 |
Sr2CrTaO6 | SrCO3, CrTaO4 | CrTaO4 | SrCrO4 | 90 |
Ca2CrTaO6 | CaCO3, CrTaO4 | CrTaO4 | CaCrO4 | 90 |
Ca3Al2(SiO4)3 | SiO2, Ca3Al2O6 | Ca3Al2O6 | Ca2SiO4 | 91 |
Pb(Zr0.52Ti0.48)O3 | TiO2, PbZrO3, PbO | PbZrO3 | Ca2SiO4 | 92 |
Ba3NiSb2O9 | BaCO3, NiSb2O6 | NiSb2O6 | Ba(SbO3)2 | 93 |
Pb(Fe0.5Nb0.5)O3 | FeNbO4, PbO | FeNbO4 | Ba3V2O8 | 94 |
Pb(Ni0.33Nb0.67)O3 | Nb2O5, Ni4Nb2O9, PbO | Ni4Nb2O9 | Nb2NiO6 | 95 |
Pb(Zr0.5Ti0.5)O3 | ZrTiO4, PbO | ZrTiO4 | Ti3PbO7 | 96 |
Pb(Co0.33Nb0.67)O3 | CoNb2O6, PbO | CoNb2O6 | TiPbO3 | 96 |
Bi(MgTi)0.5O3 | MgTiO3, Bi2O3, TiO2 | MgTiO3 | Mg(BiO3)2 | 97 |
Tl2Pu(MoO4)3 | Pu(MoO4)2, Tl2MoO4 | Pu(MoO4)2 | MgTiO3 | 98 |
Tl4Pu(MoO4)4 | Pu(MoO4)2, Tl2MoO4 | Pu(MoO4)2 | MgTiO3 | 98 |
CaSnSiO5 | SiO2, CaSnO3 | CaSnO3 | Ca2SiO4 | 99 |
CuInGaO4 | In2O3, CuO, InGaO3, Ga2O3 | InGaO3 | GaCuO2 | 100 |
These examples led to a series of ‘Eureka!’ insights. First, we realized that we should reconsider the thermodynamic boundary conditions when analyzing solid-state reactions. While the overall reaction vessel has a stoichiometry fixed at the composition of the dosed precursors, the initial interfacial reactions between precursor powders should be compositionally unconstrained. In other words, the reactants do not ‘know’ the overall stoichiometry of the reaction vessel, they simply undergo whatever reaction is most favorable at their physical interface with another powder precursor. This led us to distinguish between ‘non-equilibrium’ and ‘metastable’ compounds from a convex-hull perspective. For example, SrMoO4 is a hull-stable compound, but in the reaction to Sr2FeMoO6, SrMoO4 is a non-equilibrium intermediate, which persists kinetically due to small driving forces to complete the reaction to Sr2FeMoO6. This example illustrates how convex-hull stability is a limited and incomplete feature in the context of solid-state reactions. Instead, the topology and ‘skew’ of the convex hull towards low-energy thermodynamic ‘traps’ is a more relevant descriptor.75
Second, three precursors cannot react together at once, as three precursors only meet in space at a single point. It is much more probable that solid-state reactions proceed via interfacial reactions between two precursors at a time. Third, we realized that T = 0 K DFT convex hulls could effectively evaluate the competition between interfacial reactions, as the energy scale of solid-state reactions (∼0.5 eV per atom) is much larger than the energy scale of DFT errors (∼0.02 eV per atom),101 TΔS (∼15 meV per atom), or the energy scale of kinetics (kBT)—meaning we do not need time-consuming nudged elastic band calculations for diffusivity rates, or surface energy calculations to evaluate nucleation barriers.102,103
Next, in Miura et al.26 we tested if reactions between three precursors indeed occur two at a time. While synthesizing YBa2Cu3O6+x, we saw that the reaction starting from the common BaCO3 precursor initiated at the Y2O3|CuO interface, and that the reaction progression was slow because BaCO3 does not decompose until 1100 °C. However, when starting with BaO2, which decomposes at 550 °C, the BaO2|CuO interface reacts first. Using in situ TEM, we directly observed the sequence of pairwise reactions, which finally resulted in superconducting YBa2Cu3O6+x in only 25 minutes.
Finally, we tried to design other analogues to the ‘Sr2FeO3.5’ phase in the Sr2FeMoO6 reaction. In Chen et al.,27 we used a robotic inorganic materials synthesis laboratory to synthesize 35 known quaternary oxides from two sets of precursors: (1) the three common binary oxide precursors versus (2) a high-energy ternary oxide phase + the remaining oxide. We found that the precursor set with a high-energy ternary oxide (i.e., the Sr2FeO3.5 analogue) frequently outperformed the standard set of three binary oxide precursors. The robotic laboratory enabled us to validate this principle over a broad chemical space spanning 27 elements and 28 unique precursors. Examples of high-energy ternary oxide phases include LiPO3, LiBO2, and LiNbO3, which do not appear in the text-mined recipe dataset. It is therefore very unlikely that a machine-learning model trained on the text-mined dataset would ever predict these highly effective precursors.
Our proposed mechanism regarding the selectivity of pairwise interfacial reactions—colloquially referred to as the ‘ΔGmax’ hypothesis—later drove the active learning reaction optimization algorithm ARROWS3,104 which was the decision-making engine behind the A-Lab, the self-driving autonomous robotic synthesis laboratory at Berkeley.105 The ΔGmax hypothesis also was found to be important in hydrothermal synthesis, where a maximized thermodynamic driving force for precipitation also minimizes undesired kinetic byproducts.28 The ΔGmax hypothesis does fail in some systems; in particular it does not reliably reproduce the first interfacial reaction in sulfides106,107 or for intermetallic reactions.108 Of course, there remains more work to be done in understanding fundamental synthesis science. However, the success of the ΔGmax hypothesis thus far offers an optimistic case study for how fundamental mechanisms can be inferred from large historical datasets of materials data.
As we argued in this retrospective, one critical issue in this machine-learning pipeline arises from limitations in the volume, variety, velocity and veracity of the text-mined recipe dataset. These limitations in the 4 Vs are not failures of natural language processing; in fact, this dataset is probably the best possible product given the practical challenges of text-mining the scientific literature. Rather, many limitations in the volume and variety of the dataset have anthropogenic origins.
From an experimental chemist's perspective, exploratory synthesis of new functional materials is risky, with many potential failure modes.109 For chemists without access to ab initio predictions, it is much easier to arrive at publishable results by characterizing and tweaking known materials, instead of exploring new chemistries and structures. We believe that this human tendency to ‘exploit’ known materials rather than ‘explore’ new materials underlies the limited diversity of unique materials explored in the scientific literature—which of course is reflected in the text-mined recipe database. Furthermore, because synthesis is usually seen as a ‘means to an end’ (the end being materials properties and functionality), published synthesis recipes often do not represent the most efficient or optimized synthesis procedure. Follow-up studies on a given material also tend not to modify or revise published recipes. All in all, this risk aversion leads to a narrow variety of explored materials chemistries, as well as homogeneity in recipe design, which ultimately limits the sophistication of machine-learned synthesis prediction models.
A second critical issue in the typical machine-learning pipeline is the difficulty of featurizing a dataset when the operative physical mechanism is unknown. The current paradigm of machine learning in materials science generally proceeds by collecting a variety of materials descriptors, usually elemental properties and other features that can be conveniently pulled from databases. It is common to include as many features as possible, and hope that the machine-learning algorithm learns something new and interesting—something too complicated for humans to have anticipated. However, if the chosen features are unrelated to the essential underlying physics, the resulting machine-learning algorithms will likely make spurious associations.
In our retrospective here, we did derive new mechanistic insights into how pairwise reactions drive solid-state synthesis, and how clever precursor selection could facilitate more efficient synthesis pathways to high-component oxides. However, arriving at these insights required domain knowledge in materials thermodynamics and kinetics, and a visual abstraction of the microstructure of powder precursors in interfacial reactions. Arriving at the ‘Eureka!’ moment also required us to recognize how the convex hull was skewed towards certain low-energy competing phases. The skew and topology of the convex hull is not a feature one would likely include a priori. If, instead, the dataset was only featurized using DFT bulk formation energies, a machine-learning algorithm is unlikely to have picked up on this essential geometric aspect of the convex hull.
It seems unlikely that any existing machine-learning architecture could derive these physical insights in an unsupervised manner. Even large language models like GPT, which operate by predicting the next word in a sentence, do not have a visual representation of the microstructure of a reaction, nor access to the thermochemical data needed to assess competing reaction pathways. Of course, once we have a physical principle, we can featurize later machine-learning models properly. As shown in the ARROWS3 study,104 synthesis optimization using the ΔGmax mechanism results in more efficient synthesis optimization than physics-agnostic statistical approaches. Our point is that it is premature to apply machine-learning techniques before hypothesizing a physical model, and that for the foreseeable future, physical models will probably need to be built by human experts, not machines.
This retrospective is not a criticism of data-driven approaches, as ‘bigger data’ will always add value and enable more analyses. Our text-mined recipe dataset is orders-of-magnitude larger than any similar dataset. This large database enabled the rapid testing of synthesis hypotheses against prior experimental observations, quantitative surveys of statistical distributions, and visualizations of broad trends across chemical space. Here, this dataset helped classify ‘simple’ and ‘complex’ recipes, from which we found the unusual Sr2FeO3.5 precursor for Sr2FeMoO6. One cannot search Google Scholar for ‘unusual synthesis reaction’, nor would this key insight be readily apparent from the title of the original paper, “elaboration and characterization of the Sr2FeMoO6 double perovskite”.86
Going forward, we should not be dismayed by the anthropogenic limitations of the 4 Vs in historical datasets. Rather, we should be encouraged by the fact that scientific data is being generated at an exponentially increasing rate. In particular, the advent of robotic materials synthesis laboratories27,105,110,111 offers the opportunity to digitize and store all the synthesis metadata that is usually unreported, including ‘dark reactions’112 that do not lead to successful synthesis. Moreover, robotic labs are more likely to maintain high synthesis precision and reproducibility, meaning we will likely produce more high-quality single-source experimental synthesis data in a few years than all the historical synthesis recipes that have been published thus far. We should be optimistic about this incoming deluge of data, which, if interpreted thoughtfully and physically by human experts (and assisted by machine-learning methods), will surely lead to transformative new science on how to synthesize and manufacture novel materials.
Footnotes |
† Equal contribution. |
‡ In this manuscript, the first-person references Wenhao Sun. The “4 Vs” analysis of the text-mining dataset was performed by Nicholas David, a current PhD candidate in the Sun research group, who was not involved in the original text-mining works. |
This journal is © The Royal Society of Chemistry 2024 |