Chenru
Duan
ab,
Shuxin
Chen
ab,
Michael G.
Taylor
a,
Fang
Liu
a and
Heather J.
Kulik
*a
aDepartment of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA. E-mail: hjkulik@mit.edu; Tel: +1-617-253-4584
bDepartment of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
First published on 2nd September 2021
Virtual high-throughput screening (VHTS) with density functional theory (DFT) and machine-learning (ML)-acceleration is essential in rapid materials discovery. By necessity, efficient DFT-based workflows are carried out with a single density functional approximation (DFA). Nevertheless, properties evaluated with different DFAs can be expected to disagree for cases with challenging electronic structure (e.g., open-shell transition-metal complexes, TMCs) for which rapid screening is most needed and accurate benchmarks are often unavailable. To quantify the effect of DFA bias, we introduce an approach to rapidly obtain property predictions from 23 representative DFAs spanning multiple families, “rungs” (e.g., semi-local to double hybrid) and basis sets on over 2000 TMCs. Although computed property values (e.g., spin state splitting and frontier orbital gap) differ by DFA, high linear correlations persist across all DFAs. We train independent ML models for each DFA and observe convergent trends in feature importance, providing DFA-invariant, universal design rules. We devise a strategy to train artificial neural network (ANN) models informed by all 23 DFAs and use them to predict properties (e.g., spin-splitting energy) of over 187k TMCs. By requiring consensus of the ANN-predicted DFA properties, we improve correspondence of computational lead compounds with literature-mined, experimental compounds over the typically employed single-DFA approach.
Transition-metal complexes (TMCs) exemplify such challenges in single-DFA-based high-throughput screening. While TMCs are of interest for discovery due their widespread applications in catalysis4,22–28 and energy utilization (e.g., in redox flow batteries,29 solar cells,30 and molecular switches31), their electronic structure is challenging to describe accurately.16 The variable nature of metal, oxidation state, and spin of TMCs introduces combinatorial explosion in design spaces11,32 that cannot be exhaustively explored by either first-principles methods or experiments, motivating ML acceleration.33–39 Open-shell TMCs are particularly difficult to study due to their near-degenerate d orbitals that may introduce significant multireference (MR) character.40–44 Furthermore, many properties are highly sensitive to the choice of DFA and the resulting biases will be passed down to and encoded in ML models trained on this data. For example, ML-accelerated discovery based on semi-local DFT will identify lead TMCs targeted for specific spin state properties (i.e., spin-crossover or SCO) with weaker field ligands than those found by hybrid-DFT-derived ML models.37
When careful studies of smaller data sets have been carried out, they have revealed DFA dependence (e.g., including fraction of Hartree–Fock, HF, exchange) of property evaluations for both organic molecules45–47 and TMCs.48–52 To address this issue, some have tried to optimize a DFA for specific properties with respect to the experimental or correlated wavefunction theory (WFT) reference data53–55 or suggest DFAs in a system- and property-specific manner.56,57 Recently, we have built an ML decision engine58,59 at DFT cost that classifies systems with strong MR character and thus identifies regions of chemical space that are safe to make predictions in with DFT.60 One may expect single-reference WFT and DFAs with high HF exchange fractions to fail when strong MR character is present, but high errors may not be present for all DFAs in these cases.14,15,18 Bayesian inference has been used to analyze errors from different DFAs and design new DFAs for systems that potentially contain strong static correlation (i.e., MR character).61–64 It remains to be thoroughly investigated how the systematic biases in a dataset resulting from the behavior of the chosen DFA influence ML model training and the nature of the lead compounds in chemical discovery.
Here, we carry out the first large-scale study on over 2000 TMCs of 23 DFAs from numerous rungs of “Jacob's ladder” (i.e., from semi-local DFT to double hybrids) for three distinct chemical properties. We show that while absolute properties computed by different DFAs disagree, good linear correlations are generally observed between DFA pairs. We show how design rules obtained from the most important features in feature-selected ML models are invariant to DFA choice or basis set, providing a robust tool for materials screening. We introduce a fine-tuning strategy to train multiple artificial neural networks (ANNs) to approximate predictions of different DFAs while maintaining comparable latent spaces. We show how exploiting the consensus among 23 ANNs to discover complexes (e.g., SCOs) results in improved agreement with experiment over a single-DFA approach.
To evaluate the relative agreement among these DFAs, we focus on three properties that depend either on multiple geometries, charges, or spin states calculated for a large set of transition-metal complexes (TMCs, see Section 4). These include: (i) the adiabatic high-spin (HS) to low-spin (LS) splitting energy, ΔEH–L; (ii) the vertical ionization potential, IP, of the complex; and (iii) the frontier orbital gap from the Δ-SCF87 approach (hereafter, the Δ-SCF gap) obtained as the difference between the vertical IP and vertical electron affinity (EA) of the TMC. We selected ΔEH–L because it is known48,88–96 to be strongly sensitive to DFA choice. We selected the Δ-SCF87 evaluation of the HOMO–LUMO gap and vertical IP evaluated from total energy differences because they are expected to be less sensitive to the lack of piecewise linearity97,98 in a DFA in comparison to the same properties obtained from frontier orbital energies.14 All properties are evaluated with the single parent functional and basis set choice (B3LYP/LACVP*) we typically employ for its efficiency in high-throughput screening (see Section 4). By using a consistent geometry and developing a strategy for preserving the qualitative description of the wavefunction across DFAs (see Section 4), we isolate the role of the DFA parameterization in altering predicted energetic properties.
Although the absolute computed values differ by DFA for each of the three properties, the obtained values from different DFAs generally have high linear correlations, as quantified by Pearson's correlation coefficients (Fig. 1, ESI Fig. S1 and Table S2†). To put this in context, the range of ΔEH–L values can differ strongly by functional (M06-2X: −122.8 to 50.7 kcal mol−1vs. M06-L: −59.2 to 100.4 kcal mol−1) as can, to a lesser extent, vertical IP (M06-L: 0.9 to 28.4 eV vs. M06-2X: 1.2 to 29.6 eV). We observe strong positive correlations between all pairs of DFAs, even for DFAs from different rungs of “Jacob's ladder”, and for properties (i.e., ΔEH–L) that can be expected to be strongly functional-dependent. Across all functionals, the correlations for vertical IP are consistently high (i.e., 0.99–1.00) so we focus further analysis on the Δ-SCF gap and especially on the most DFA-sensitive property ΔEH–L (ESI Fig. S1†).
Despite strong DFA sensitivity for ΔEH–L, three GGAs have near-perfect linear correlations (Pearson's r > 0.99) with each other (Fig. 1). For this property, most of the meta-GGAs also have extremely high (Pearson's r > 0.98) linear correlation with GGAs, with MN15-L being the sole exception (e.g., Pearson's r of 0.89 with BLYP, ESI Fig. S2†). One might expect a functional like the SCAN meta-GGA that has been demonstrated to make more accurate predictions of formation enthalpy99 and reaction energy100 to correlate poorly with less accurate semi-local GGA functionals, but SCAN-computed Δ-SCF gaps and ΔEH–L values correlate just as highly to the GGAs as other few-parameter meta-GGAs and better than the more highly parameterized MN15-L (Fig. 1 and ESI Fig. S1†). Although the family of double hybrids has lower correlations with pure GGAs in comparison to their correlation with hybrids (i.e., GGA or meta-GGA) for both Δ-SCF gap and ΔEH–L, even the low correlations are still relatively high (Pearson's r = 0.8–0.9, Fig. 1 and ESI Fig. S1†). These good correlations likely benefit from our workflow that ensures qualitative correspondence and limited spin contamination of the converged electronic state with change of DFA (see Section 4).
As could be expected, HF exchange influences ΔEH–L correlations significantly: within the LYP correlation family, B3LYP and B2GP-PLYP are more highly correlated with one another than the latter is with BLYP (ESI Fig. S2†). This behavior of ΔEH–L extends to the more highly parameterized Minnesota (e.g., M06) functionals, e.g., Pearson's r coefficients are lowest between the pure meta-GGA M06-L and M06-2X with high HF exchange (ESI Table S3†). Overall, DFA agreement, as quantified by Pearson's r values, is surprisingly strong across our data set with the 23 distinct functionals regardless of property. When deviations occur between functionals (e.g., for Δ-SCF gap or ΔEH–L), they appear to be due most to HF exchange fraction rather than to pure DFA parameter or correlation family choice.
Among all possible DFA pairings in our set, we observe the lowest Pearson's r for ΔEH–L the pure GGA BLYP and the highly parameterized meta-GGA hybrid M06-2X (Pearson's r: 0.64, Fig. 1). This pair of DFAs is also among the most poorly correlated for Δ-SCF gap (Pearson's r ca. 0.8, ESI Fig. S1†). As an example of this disagreement, differences between the two DFAs of over 100 kcal mol−1 are observed for ΔEH–L (BLYP: 98 kcal mol−1, M06-2X: −6 kcal mol−1) for Fe(II)(CS)6. While BLYP predicts a low-spin (LS) ground state expected for the strong-field CS ligand, M06-2X predicts a likely incorrect high-spin (HS) ground state. For another TMC with similarly strong-field ligands, Co(III)(C2H3N)6, BLYP and M06-2X both predict a LS ground state, albeit with large variations in the ΔEH–L (BLYP: 88 kcal mol−1, M06-2X: 35 kcal mol−1) predicted.
For the cases where pairs of DFAs demonstrate low linear correlations for specific properties (i.e., ΔEH–L and Δ-SCF gap), we note large differences in the distributions of the computed property (Fig. 2 and ESI Fig. S3–S5†). For example, two peaks are observed in the M06-2X ΔEH–L distribution in comparison to only one for M06 and M06-L (Fig. 2). In contrast to the two DFA-sensitive properties, vertical IPs computed with different DFAs tend to differ by a small rigid shift in value with a preserved distribution (Fig. 2 and ESI Fig. S4†). A qualitative difference in the ΔEH–L distributions is associated with more disagreement between functionals (e.g., M06-L and M06-2X in Fig. 2). M06-L and M06-2X predict strong-field Fe(II)(HNO)6 to have the same ΔEH–L of 51 kcal mol−1, but they differ strongly in their predictions for the mixed ligand field of Mn(II)(CO)4(I−)2. In this case, different ground states are obtained with M06-L (LS; ΔEH–L: 32 kcal mol−1) and M06-2X (HS; ΔEH–L: −61 kcal mol−1), and the predicted spin-splitting values differ by 93 kcal mol−1. Returning to the vertical IP for the same pair of DFAs, consistent distributions are instead observed, with HF exchange in M06-2X rigidly shifting the IP up by around 5% (i.e., 1–2 eV over a 30 eV range, Fig. 2). The greatest disagreement, which is observed for LS Co(II)(H2O)4(NH3)2, is only twice this amount (i.e., 4 eV) when comparing M06-L (17.1 eV) to M06-2X (21.4 eV, Fig. 2).
The ranking of compounds by ΔEH–L and Δ-SCF properties varies with DFA choice, but little variation is observed for the vertical IP percentile ranks (Fig. 3 and ESI Fig. S6†). For ΔEH–L, the TMCs with the strongest (e.g., Co(III)(CO)6) and weakest ligand fields (e.g., Mn(II)(H2O)5(C5H5N)) have values at the extremes of the distributions and correspondingly their percentile ranks obtained with the 23 DFAs have the smallest standard deviations (ESI Table S4†). Complexes with moderate ligand field strengths instead have the largest standard deviation of percentile ranks, suggesting that the ordering of TMCs with moderate ligand field strengths for ΔEH–L is most strongly functional dependent. For example, Mn(II)(CO)4(H2O)(C5H5N) is an intermediate-rank ΔEH–L complex (average across the 23 DFAs: 49th percentile) but has a percentile rank ranging from the 20th (e.g., 20 for M06-2X or 26 for DSD-BLYP-D3BJ) to the 73rd percentile (e.g., 73 for BP86 and 65 for M06-L) depending on the DFA (ESI Table S4†). Generally, the pure GGAs and pure meta-GGAs predict this compound to have the highest percentile rank while double hybrids predict it to have among the lowest. Thus, DFAs can broadly be expected to agree at extremes but have divergent behavior for these compounds that have intermediate ΔEH–L properties arising from a mixture of strong-field and weak-field axial ligands (Fig. 3 and ESI Table S4†).
Fig. 3 Uniform manifold approximation and projection (UMAP)101 2D visualization of the latent space of a B3LYP/LACVP* ANN (see Section 4) for ΔEH–L (left) and vertical IP (right). Each TMC is shown as a circle colored by the average percentile rank of the property (ΔEH–L (left) and vertical IP (right)) obtained over all 23 DFAs and is scaled by the std. dev. of the percentile rank over the DFAs. Representative TMCs from left to right in the left pane: Mn(II)(H2O)5(C5H5N), Mn(II)(CO)4(H2O)(C5H5N), Co(III)(CO)6, and in the right pane: HS Mn(II)(C4H4O)4(CO)2 and LS Fe(II)(NH3)6. Atoms are colored as follows: orange for Fe, purple for Mn, pink for Co, blue for N, red for O, gray for C, and white for H. |
In contrast to ΔEH–L, the ranking of vertical IPs of TMCs remains nearly constant across all 23 DFAs, including for those complexes that have intermediate values (e.g., LS Fe(II)(NH3)6 in Fig. 3, ESI Table S4†). This result is expected because variation in the functional was qualitatively observed to rigidly shift the vertical IP distribution (Fig. 2). Still, there are a few exceptions with large percentile rank standard deviations among the DFAs for vertical IP. In one extreme example, HS Mn(II)(C4H4O)4(CO)2 has a low percentile rank (i.e., <40) for most pure GGAs (e.g., 36 for BLYP) and meta-GGAs but a higher percentile rank (i.e., >60) for hybrids (e.g., 66 for SCAN0) and double hybrids (ESI Table S4†).
For a given functional, the basis set can also be expected102–105 to influence property predictions. We therefore repeated our analysis with a larger triple-ζ (i.e., def2-TZVP) basis set in addition to the double-ζ LACVP* basis set that is more amenable to high-throughput screening. Over each of the three properties and 23 DFAs, we observe that properties computed using the small and large basis sets display both high linear correlation (Pearson's r > 0.98) and absolute property prediction agreement (ESI Table S5 and Fig. S7†). Although one may expect vertical IP to be more dependent on basis set, e.g., for complexes with strongly negatively charged ligands,35,106 we observe little basis set dependence even for this property (ESI Fig. S8†). Therefore, we conclude that DFA dependence outweighs basis dependence for evaluation of the properties considered here, and subsequent discussions focus on results obtained with the VHTS-relevant LACVP* basis set.
Across this wide set of DFAs, the RF-RFA/KRR-selected features are insensitive to the functional choice for each of the three properties studied (Fig. 4 and ESI Fig. S9–S14†). This observation holds both for the DFA-sensitive ΔEH–L and DFA-insensitive vertical IP (Fig. 4). When quantitative differences are observed, they occur for functionals and properties that had poor correlation, e.g., differences in the fraction of metal-coordinating atom features for M06-2X and ΔEH–L (Fig. 4). Even when small quantitative differences are observed between features selected for each of the DFAs, these differences are significantly smaller than the magnitude of differences among the selected features for each of the three properties within a DFA.
Fig. 4 (Left) pie charts of the RF-RFA/KRR-selected RAC-155 features for ΔEH–L (top), Δ-SCF gap (middle), and vertical IP (bottom) for the DFAs indicated (top). Features are grouped by the most metal-distal atoms: metal in blue, first coordination sphere in red, second coordination sphere in green, third coordination sphere in orange, and more distant, global features in gray. A black outline groups the first three categories (i.e., within two bond paths to the metal) as metal-local features. Within each connectivity distance category, the property (i.e., χ, S, T, Z, or I) is also indicated, with the oxidation/spin state (O) assigned as metal-local and the ligand charge (L) assigned as global. (Right) box plot for the fraction of metal-local features (top) and the fraction of electronic features (bottom) for all 23 DFAs at each property. Following our previous work,60,107 we have categorized χ, Z, O, and L as electronic features, with all remaining features categorized as geometric. |
For example, RF-RFA/KRR on ΔEH–L from either of the poorly correlated pair of GGA BLYP and meta-GGA hybrid M06-2X DFAs selects a feature set with a comparably high fraction of metal-local features (BLYP: 0.72, M06-2X: 0.72) and electronic (i.e., electronegativity, nuclear charge, oxidation state, see Section 4) features (BLYP: 0.78, M06-2X: 0.83, Fig. 4). The observation of the invariance of selected features for ΔEH–L also holds for M06-L and M06-2X (Fig. 4). Thus, the feature-derived design rules are insensitive to the significant differences in the distributions of ΔEH–L obtained with each of the DFAs, despite this difference leading to variations in percentile rank or low correlations among functionals (Fig. 2).
Although higher-rung double hybrids have been shown96 in some cases to yield more accurate property prediction for spin state ordering, feature selection for all three properties on the PBE0-DH double hybrid yields very similar selected features to DFAs from lower rungs (Fig. 4 and ESI Fig. S9–S14†). Notably, semi-local DFAs that often yield unphysically small or closed HOMO–LUMO gaps give nearly the same selected features as range-separated functionals that contain asymptotically correct, non-local (i.e., 1/r) exchange, even for the vertical IP and Δ-SCF gap (ESI Fig. S9–S14†). Consistent with the correlation analysis, we also observe weak dependence of the selected features on basis set for a given property-DFA combination, reinforcing the utility of small basis sets for computational high throughput screening (ESI Fig. S15†). Taken together, these observations provide powerful support for the design rules revealed through RF-RFA/KRR; such design features are robust to DFA choice and basis set to a much greater extent than absolute or even relative (i.e., rank ordering) property prediction across diverse properties.
A related, open question is the extent to which observations of design rules we have made are sensitive to the nature of the compounds in and the size of the data set on which the models are trained. Consistent with prior work using B3LYP on modest data sets,35,107 the ΔEH–L, Δ-SCF gap, and vertical IP demonstrate decreasing dependence on metal-local and electronic features across representative DFAs from our broader 23 DFA set (Fig. 4). Importantly, the RF-RFA/KRR-selected features are quantitatively comparable, even when considering distinct data sets. The current set contains both smaller complexes with considerably more diverse metal-local chemistry (i.e., both P/S/Cl- and C/N/O-coordinating) and a greater number of ligand types and sizes relative to the set in previous work35,107 that had only five unique ligands with a narrow range (i.e., C/N/O) of metal-coordinating atoms (Fig. 5 and ESI Fig. S16†). Thus, not only is the RF-RFA/KRR feature map insensitive to method choice, but the design rules are likely insensitive to data set choice as long as sufficient variation (e.g., metal identity and ligand field strength) is included in the set.108
Fig. 5 (Top, left) kernel density estimation (KDE) of the size distribution of complexes in two subsets of data from prior work (MD1 + OHLDB) used in this study and the small redox set (SRX) of only five small ligand types from previous work.35,107 (Top, right) clustered bar graph for the connecting atom identity (X indicates any halide) in the two sets. (Bottom) pie charts of the features selected by RF-RFA/KRR for ΔEH–L with B3LYP/LACVP* for MD1 + OHLDB (left) and SRX (right). The pie chart labels follow the format of those in Fig. 4. |
Overall, the robustness of RF-RFA/KRR-selected features to data set, basis set, and DFA suggests an efficient approach for materials design. To reveal design rules for new properties in materials spaces that have twin challenges of combinatorial explosion and method accuracy such as open-shell transition-metal chemistry, low-cost DFAs (e.g., GGAs) and small basis sets (i.e., double-ζ) on modest data sets of small, representative complexes can be used to efficiently reveal design principles even when they would be insufficient for individual property predictions.
We next aimed to screen hypothetical compounds with ANNs trained on all 23 DFAs to identify trends of agreement and disagreement among DFAs with ANN models that had comparable confidence (i.e., as judged through latent space distance uncertainty quantification109) and predictive accuracy. To overcome the challenge of inequivalent ANN latent spaces, we use the weights of the ANN trained with B3LYP data as a starting point to train fine-tuned ANNs (FT-ANNs) on properties obtained with each of the 23 DFAs (Fig. 6 and see Section 4 and ESI Table S6†). The FT-ANNs trained through this procedure have comparable latent spaces for different DFAs without sacrificing prediction accuracy in comparison to alternative (i.e., RF-RFA/KRR or standard ANN) models (Fig. 6). This is true regardless of whether the DFA-calculated properties are well correlated with each other or with the parent B3LYP DFA used to obtain the initial ANN model weights and architecture (ESI Fig. S17–S19 and Tables S7–S9†). Despite having similar latent spaces, each of the 23 ANNs predicts distinct properties approximating a unique DFA, enabling us to understand in the context of large-scale chemical discovery how ML models differing in DFA data sources will influence absolute or relative property prediction performance.
Leveraging the 23 FT-ANNs trained on all of the DFAs, we next investigate how ML-approximated knowledge of DFA predictions will influence the design of lead compounds in comparison to the more common approach of using an ML model trained on a single DFA. We first define a target property and then identify consensus lead TMCs as the set of materials in which a majority (i.e., ≥12) of the ML models each trained on a distinct DFA are in agreement about the target property value. Because we have selected a wide-ranging set of DFAs that include semi-local functionals, meta-GGAs, and range-separated hybrids along with varied HF exchange and MP2 correlation fractions, the consensus lead TMCs cannot be chosen simply due to the dominance of a single family of closely related functionals (ESI Tables S1 and S10†). In our procedure, we also perform discovery using latent-space-distance-derived uncertainty quantification,109 restricting our chemical discovery task to regions of chemical space with high ML model confidence (ESI Fig. S20 and Table S11†).
We apply our consensus-based approach in the screening of a large space of 187200 TMCs obtained from ref. 60. This enumerated space consists of HS and LS M(II/III) midrow metals (M = Cr, Mn, Fe, or Co) with 36 unique ligands in heteroleptic and homoleptic mononuclear octahedral TMCs (ESI Tables S12–S14†). The diverse chemistry, metal-coordinating atom types, symmetry of the complexes, and sizes of the ligands produce a large space of smaller TMCs along with those with up to 200 atoms (ESI Fig. S21 and Tables S12–S13†).
Screening complexes with targeted Δ-SCF gaps <3 eV in this design space is motivated by their relative rarity (ca. 0.1%) in the original set of data from the 23 DFAs (ESI Fig. S22†). We find that lead TMCs for the targeted Δ-SCF gap are robust to the choice of DFA (Fig. 7 and ESI Table S15†). Even DFA pairs with weak linear correlations among our 23-DFA set, such as BLYP and M06-2X (Δ-SCF gap Pearson's r = 0.80), still recommend similar (21% overlap for compounds favored by either functional) lead complexes (ESI Fig. S23†). Although the original data set contains few complexes with the targeted (i.e., <3 eV) Δ-SCF gaps, the ML models generalize well on the 187200 complex design space, yielding fruitful candidate lead complexes for this design objective. As would be observed with a single DFA, the consensus targeted Δ-SCF gap lead complexes favor large or bidentate N- or O-coordinating ligands with no significant metal preference, an observation that follows the expected trend of smaller Δ-SCF gap with increasing system size (Fig. 8 and ESI Fig. S24†). Because of the robustness of the Δ-SCF gap, this observation does not change if we only employ a single DFA or single family of DFAs (ESI Fig. S25†). When different DFAs were used to train ML models to predict band gaps in solid state materials, it was observed that the ML models sometimes failed to preserve the rank ordering of band gaps that would have been otherwise preserved by lower-level DFAs.110 In our FT-ANN training approach for predicting Δ-SCF gaps of different DFAs, we do not observe this effect. Over the 187.2k compounds, high rank correlation (Spearman's r = 0.86 to 0.99) is observed between all 23 FT-ANN models (ESI Fig. S26†).
To validate our approach on a more challenging property, we next target discovery of SCOs with |ΔEH–L| < 5 kcal mol−1 because SCO chemistry is known to be strongly sensitive to DFA choice (e.g., HF exchange fraction35,37). We identify the consensus (i.e., majority) leads selected by family of functional. We find that discovered complexes are indeed very sensitive to HF exchange fraction, resulting in a limited number (<1%) of leads identified by consensus of the pure semi-local DFAs also selected by the consensus of the HF-exchange-containing hybrid DFAs or double-hybrid DFAs (Fig. 7 and ESI Fig. S27†). Although the GGA-hybrid and double-hybrid family of functionals are expected to be more similar to each other than to those classified as pure GGAs, their consensus-suggested lead complexes differ significantly, with only 9% overlap between leads favored by the consensus of either family of functionals (Fig. 7). Additionally, lead SCO complexes can differ even when we compare DFAs within the same rung of “Jacob's ladder” that were observed to have strong linear correlation with each other for ΔEH–L. For example, within the meta-GGA hybrid group, SCAN0 and TPSSh (ΔEH–L Pearson's r = 0.97) recommend vastly different lead SCO complexes (i.e., only 3% in common), likely due to a rigid shift of ΔEH–L values between the two DFAs (ESI Fig. S27†). That different design objectives result in divergent behavior with respect to DFA sensitivity of the predicted leads suggests that the conventional workflow of only considering a single DFA for chemical discovery may work for some design targets but not others.
Specific examples illustrate how the large DFA sensitivity of ΔEH–L values results in DFA-dependent chemistry for the SCO candidates. As expected,37,48,49,55 we find that GGAs (e.g., BLYP) have a low-spin bias and favor O-coordinating weak-field ligands, whereas hybrid functionals (e.g. B3LYP) favor N-coordinating intermediate-field ligands in SCO candidates (ESI Fig. S28†). When requiring a majority of 23 DFAs to agree, consensus lead SCO complexes are mostly Fe/Co complexes with weak/moderate-field ligands, matching expectations from experimentally characterized SCOs111 (Fig. 8). Specifically, the consensus lead SCO complexes exclude C-coordinating strong-field ligands or extremely weak field ligands such as small anions (e.g., S2−, F−, and I−). We observe few Cr or Mn SCO complexes, indicating no consensus designs for SCOs containing these metals. Importantly, both sets of discrete lead complexes (i.e., SCO and targeted Δ-SCF gap) identified by the ANNs follow the design rules revealed by RF-RFA/KRR, i.e., that ΔEH–L depends much more on metal-local features than the Δ-SCF gap (Fig. 4 and 8).
To demonstrate the distinct advantages of our consensus-based workflow for chemical discovery, we mined experimentally observed SCO complexes from the Cambridge Structural Database (CSD)112 following slight modifications to the procedure used in prior work95,113 and compared them to our ML lead complexes (ESI Text S1†). We observe significant overlap between the experimentally identified SCO complexes and those discovered by our consensus ML approach by visualizing both sets of compounds in the latent space of the B3LYP ANN (Fig. 9). For example, Co(II)(NCS−)4(NCO−)(C2H3N), a SCO complex predicted by our consensus ML approach, is close to an experimentally observed Co(II), N-coordinating SCO complex (CSD refcode: JUMPEO, Fig. 9). When qualitative differences between the experimentally observed SCO complexes and our consensus ML leads are seen, they likely result from differences between the composition of our design space and the experimentally studied SCO compounds.114 For example, a hexadentate Mn(II) complex with mixed N and O coordinating atoms is an experimentally observed SCO complex (CSD refcode: YANLAC), but we do not have any hexadentate TMCs in our design space and therefore do not predict any similar Mn(II) compounds to be SCOs.
In comparison to the consensus-based approach, when we apply our conventional workflow of using a single DFA (e.g., B3LYP) to discover lead SCO complexes, we find that the candidate SCOs occupy a much larger region of the B3LYP ANN latent space than the experimental SCO complexes (Fig. 9). This suggests that many of the lead compounds obtained from an ANN trained on the single B3LYP DFA are likely false positives (Fig. 9). This comparison demonstrates the power of using DFA consensus to constrain ML-identified lead complexes to reasonable chemical spaces during discovery. While the consensus-based approach naturally increases the risk of false negatives in comparison of the single DFA, false negatives are observed primarily for the experimental SCO cases where average model confidence is low (ESI Fig. S29†). Thus, the good performance of the consensus-based approach could be improved even further by incorporating more known experimental SCOs absent from the original training data. Notably, use of converged wavefunctions and structures from the parent DFA to derive properties with other DFAs as well as ANN model weights both improves consistency in the consensus-based approach and limits the computational overhead in comparison to the best available alternatives (e.g., correlated wavefunction theory or experiments).
Due to their widespread study, Fe metal centers with N-coordinating ligands dominate our set of experimentally studied SCO complexes111 (ESI Table S16†). The robustness of our consensus-based approach motivates us to make recommendations for other regions of chemical space to explore beyond these well-studied Fe/N SCOs. Because Co(II) and Co(III) complexes appear frequently in the consensus leads and lie near experimental Fe SCO complexes in the latent space of the B3LYP ANN model, our studies suggest their potential for experimental or theoretical validation in SCO complex design (Fig. 8, ESI Fig. S30†). The most likely candidate ligand chemistry suggested by the consensus screen is Co in combination with N-coordinating intermediate-field ligands such as isothiocyanate, cyanate, and pyridine or higher-denticity analogues.
Feature selection revealed design rules for each property that were invariant over the 23 DFAs tested, even those DFAs that showed poor linear and rank correlations with other predictions. In addition, the selected features were strikingly similar between data sets that contained different metal-coordinating chemistry and system sizes. The robustness of RF-RFA/KRR selected features suggests that universal design rules for new properties can be uncovered with low-cost DFAs and small basis sets on small, representative complexes. Such design rules can then guide more targeted exploration for refinement of properties at higher levels of theory.
To enable large-scale chemical discovery informed by the 23 DFAs, we developed a fine-tuning procedure to obtain a set of comparably performing ANNs trained each trained on data from a single DFA. Using these models to explore a large space of 187200 TMCs, we obtained design principles for SCO complexes and complexes with a targeted Δ-SCF gap. Lead targeted-gap complexes were robust to the choice of DFA, whereas lead SCO complexes were very sensitive to both the choice of DFA family and HF exchange fraction. This observation suggests that the conventional use of a single DFA in VHTS and ML workflows is appropriate for only specific types of properties.
By requiring consensus among more than half of the DFAs for a chosen property in a discovery workflow, we overcame the limitations of single DFA. While the single-DFA and consensus approaches both recapitulated RF-RFA/KRR design principles and produced consistent leads for DFA-insensitive properties, the consensus-based approach was critical to identifying lead SCO complexes. These consensus leads overlapped significantly in the ANN model latent space with experimentally observed SCO complexes from the CSD. In contrast, lead SCO complexes identified by a single DFA (e.g., B3LYP) occupied a much larger region of chemical space, indicating many single-DFA leads to be false positives. Thus, using DFA consensus with the approach described here to constrain ML-identified leads during chemical discovery is a promising method to improve prediction robustness without resorting to higher computational cost (i.e., correlated wavefunction theory) methods.
For all DFT geometry optimizations carried out in the original work, TeraChem,116 as automated by molSimplify117,118 and molSimplify automatic design (mAD),107 was employed. These calculations used the B3LYP70–72 hybrid functional with the LACVP* basis set, which corresponds to the LANL2DZ119 effective core potential for transition metals (i.e., Cr, Mn, Fe, Co) and heavier elements (i.e., I or Br) and the 6-31G* basis for all remaining elements. All non-singlet states were calculated with an unrestricted formalism and singlet states with a restricted formalism. In all these calculations, level shifting of 1.0 Ha on virtual majority-spin orbitals and 0.1 Ha on virtual minority-spin orbitals was employed.
We developed an approach to maximize correspondence between B3LYP and the 22 additionally studied DFAs that also reduced computational cost (ESI Table S1†). We carried out single-point energy evaluations on B3LYP/LACVP* structures with 23 total DFAs with a LACVP* basis and with a larger, triple-ζ def2-TZVP basis set. To automate these single-point energy calculations, we interfaced molSimplify with a developer version of Psi4 (ref. 15) (1.4a2.dev723). This step was necessary in part because meta-GGAs, double-hybrids, and Minnesota functionals and the larger basis set were unavailable in TeraChem. The 23 functionals included in our study are described in Section 2, and they were used with all default definitions applied in Psi4, as indicated in ESI Table S1.†
In our accelerated workflow, we used the previously converged33–35,107,115 B3LYP/LACVP* TeraChem wavefunction molecular orbital coefficients as an initial guess for a Psi4 B3LYP/LACVP* SCF calculation. The converged Psi4 B3LYP/LACVP* wavefunction was used as an initial guess to obtain self-consistent single-point energies in Psi4 for all other functionals with the LACVP* basis set using the recommended grid size120,121 with 99 radial points and 590 spherical points (ESI Fig. S31 and Table S17†). We also carried out single-point energy calculations with the larger def2-TZVP122 basis set for all combinations of functionals and complexes (ESI Fig. S31†). For these larger-basis calculations, we first carried out basis set projection to obtain the B3LYP/def2-TZVP converged wavefunction from the B3LYP/LACVP* result in Psi4. These calculations were run with a maximum of 50 SCF iterations to reach SCF convergence with both the energy and density convergence thresholds being 3.0 × 10−5 Ha (ESI Fig. S32 and Table S17†). Some pure GGA and meta-GGA calculations did not initially converge. In these cases, we performed calculations with a hybrid form of the pure GGA or meta-GGA, sequentially reducing the percentage of HF exchange and extrapolating to the 0% HF (i.e., pure GGA) total energy (ESI Fig. S33–S34 and Text S2†).
The three primary properties computed (i.e., ΔEH–L, vertical IP, and Δ-SCF gap) were all evaluated on B3LYP/LACVP* geometries obtained with TeraChem. For vertical IP and Δ-SCF gap evaluated on the N-electron reference system, we adopted a consistent spin state convention: we removed a majority-spin electron from the N-electron reference for the N − 1-electron calculation (i.e., for IP) and added a minority-spin electron for the N + 1-electron case (i.e., for EA), in each case starting the Psi4 calculation with an initial guess for the (N − 1/N + 1)-electron calculation from the converged B3LYP/LACVP* TeraChem result (ESI Fig. S31 and S35†).
The filtering procedure applied in previous work60 required that all B3LYP/LACVP* wavefunctions had limited spin contamination (i.e., 〈S2〉 deviations from the expected S(S + 1) value <1.0μB2). The number of calculations excluded by this filtering threshold is sensitive to the choice of functional (ESI Fig. S36†). In this work, we increased the cutoff for inclusion to 1.1μB2 (ESI Fig. S36†). Finally, complexes were retained in the data set only if properties could be converged below this cutoff from all 23 functionals (ESI Tables S18–S19†).
All ANN models were trained using Keras125 with Tensorflow126 as the backend and Hyperopt123 for hyperparameter selection (ESI Table S20†). To avoid randomness in the weight initialization and to increase the consistency between ANN models trained with DFT data derived from different functionals, we fine-tuned the B3LYP ANN model with a reduced (i.e., 1 × 10−5) learning rate for each of the 23 functionals, which produced 23 fine-tuned ANN (FT-ANN) models at each property and basis set combination (ESI Table S6†). All ANN models were trained with the Adam optimizer up to 2000 epochs, and dropout, batch normalization, and early stopping were applied to avoid over-fitting (ESI Table S20†).
Footnote |
† Electronic supplementary information (ESI) available: Summary of 23 DFAs; statistics of properties obtained by 23 DFAs; Pearson's r matrix and parity plots for ΔEH–L, vertical IP, and Δ-SCF gap; distributions of ΔEH–L, vertical IP, and Δ-SCF gap at different DFAs; UMAP 2D visualization of Δ-SCF gap data; percentile ranks at each DFA for example complexes; statistics and parity plots for basis set comparison; pie charts of RF-RFA KRR selected features at different DFAs; RF-RFA KRR selected features for vertical IP for difference datasets; hyperparameters for FT-ANN models; MAEs, R2, and scaled MAEs of all 23 functionals for three properties; uncertainty quantification metric and its cutoff; possible metal, oxidation, spin state, and ligand combinations in the 187200 complex space; size distribution of the 187200 complexes design space; histograms of Δ-SCF gap grouped by system size; Venn diagrams of lead Δ-SCF gap complexes and lead SCO complexes; network graph of lead Δ-SCF gap and lead SCO complexes; procedure of isolating of candidate SCO complexes; statistics of experimental SCO complexes; UMAP visualization of selected lead SCO complexes; computational workflow and DFT parameters used therein; Hartree–Fock linear extrapolation scheme; statistics of SCF iterations for convergence and failed calculations before and after HF extrapolation; electron configuration diagram of electron addition/removal convention; extended description of RAC featurization; range of hyperparameters sampled during Hyperopt for KRR and ANN models. See DOI: 10.1039/d1sc03701c |
This journal is © The Royal Society of Chemistry 2021 |