Insights into the deviation from piecewise linearity in transition metal complexes from supervised machine learning models

Virtual high-throughput screening (VHTS) and machine learning (ML) with density functional theory (DFT) suﬀer from inaccuracies from the underlying density functional approximation (DFA). Many of these inaccuracies can be traced to the lack of derivative discontinuity that leads to a curvature in the energy with electron addition or removal. Over a dataset of nearly one thousand transition metal complexes typical of VHTS applications, we computed and analyzed the average curvature ( i.e. , deviation from piecewise linearity) for 23 density functional approximations spanning multiple rungs of ‘‘Jacob’s ladder’’. While we observe the expected dependence of the curvatures on Hartree-Fock exchange, we note limited correlation of curvature values between diﬀerent rungs of ‘‘Jacob’s ladder’’. We train ML models ( i.e. , artificial neural networks or ANNs) to predict the curvature and the associated frontier orbital energies for each of these 23 functionals and then interpret diﬀerences in curvature among the diﬀerent DFAs through analysis of the ML models. Notably, we observe spin to play a much more important role in determining the curvature of range-separated and double hybrids in comparison to semi-local functionals, explaining why curvature values are weakly correlated between these and other families of functionals. Over a space of 187.2k hypothetical compounds, we use our ANNs to pinpoint DFAs for which representative transition metal complexes have near-zero curvature with low uncertainty, demonstrating an approach to accelerate screening of complexes with targeted optical gaps.


Introduction
Virtual high-throughput screening (VHTS) [1][2][3][4][5][6][7][8] with first-principles density functional theory (DFT) accelerated by machine learning (ML) [9][10][11][12][13][14][15] has advanced the discovery of new molecules and materials.Nevertheless, such workflows inherit the limitations of the density functional approximations (DFAs) used in practice.3][24][25][26] These SIEs result in fundamental errors in the description of dissociation energies, 24,[27][28][29][30] barrier heights, 31 band gaps, 32,33 electron affinities, [34][35][36] and thermodynamic properties. 37Because the exact energy functional should be piecewise linear with respect to fractional removal or addition of charge, 38 the global curvature, that is the second derivative of the energy E with respect to charge q, is considered a good measure for many-electron self-interaction error also known as delocalization error (DE).A number of efforts in tailoring functionals to eliminate this DE, [39][40][41][42][43] for example, by tuning, have been successful for improving DFT predictions of optical properties (e.g., the gap between the highest-occupied molecular orbital, HOMO, and lowest unoccupied molecular orbital, LUMO). 44 potential solution to overcome these limits is to climb up a ''Jacob's ladder'' 45 of DFAs, where functionals on higher rungs include greater complexity such as higher-order derivatives of the density, Hartree-Fock (HF) exchange, and long-range correlation from perturbation theory (i.e., MP2) to recover dispersive interactions from first principles.Doing so has been shown to increase accuracy for organic molecules, but simply climbing to higher rungs does not always guarantee improvements in challenging systems such as transition metal complexes.46,47 While transition metal complexes are key targets for chemical discovery due to their widespread applications in catalysis 4,[48][49][50][51][52][53][54] and energy utilization (e.g., in redox flow batteries, 55 solar cells, 56 and molecular switches 57 ), their electronic structure is uniquely challenging to describe.18 The variable nature of metal, oxidation state, and spin of transition metal complexes introduces combinatorial explosion in design spaces 11,58 that cannot be exhaustively explored by either first-principles methods or experiments, motivating ML acceleration.[59][60][61][62][63][64][65] Open-shell transition metal complexes are particularly difficult to study due to their near-degenerate d orbitals that may introduce significant multireference (MR) character.[66][67][68][69][70] Furthermore, their properties are highly sensitive to the choice of DFA and the resulting biases will be passed down to and encoded in ML models trained on this data. Fo example, ML-accelerated discovery based on semi-local DFT will identify lead TMCs targeted for specific spin state properties (e.g., spin-crossover or SCO) with weaker-field ligands than those found by hybrid-DFTderived ML models.63 In general, choosing the ''right'' rung a priori in ML model training or discovery efforts may be impractical if reference benchmarks are not established, and, instead, a single low-cost, heavily tested DFA (e.g., PBE or B3LYP) is usually employed.Nevertheless, when careful studies of smaller data sets have been carried out, they have revealed DFA dependence (e.g., on the fraction of HF exchange) of property evaluations for both organic molecules [71][72][73] and transition metal complexes.[74][75][76][77][78] An alternative paradigm that has recently emerged is to train ML models that can predict appropriate parameters or methods for first-principles simulation, including the selection of active spaces, 79,80 detection of multi-reference character, [81][82][83] or identification of the best DFT functional to reproduce higherlevel reference calculations.84,85 ML models have been developed as improved xc functionals, [86][87][88][89][90] for example, by training on the correct fractional charge energetics needed to recover piecewise linearity.In a complementary approach, Corminboeuf and coworkers trained ML models to predict the degree of curvature or DE of small organic molecules to suggest parameters for optimal tuning of functional parameters.91 Marom and coworkers recently developed ML models to predict the appropriate Hubbard U for use in DFT+U.92 We recently curated 93 a set of around 1000 transition metal complexes to train artificial neural networks (ANNs) on 23 DFAs to predict properties across multiple rungs of ''Jacob's ladder''.45 We showed that requiring consensus among the DFAs improved predictions when compared to experiment.93 Extending upon this work, we now leverage the same dataset to reveal trends in curvature with DFA.In this work, we compute the curvatures of the complexes in the dataset for each of the 23 DFAs.We identify different trends in curvature values (i.e., as judged through correlations in the data) for functionals from distinct rungs.
We train machine learning models (e.g., ANNs) on this data set and analyze the most important features of each model to better understand what aspects of chemical composition most strongly influence the curvature of a transition metal complex for a given DFA.We then use these ANN models to screen a space of 187k hypothetical complexes to pinpoint DFAs that would yield piecewise linearity in representative complexes.We validate these leads, demonstrating ANNs as an efficient approach to rapidly identify DFAs and compounds for which approximate DFT will recover exact conditions.

2a. Data set and details of DFT calculations
We employed a data set curated in prior work, 93 which consists of the equilibrium geometries of 948 isolated mononuclear transition metal complexes obtained with density functional theory (DFT).As in prior studies, [59][60][61][93][94][95] we studied M(II)/ M(III) complexes (M = Cr, Mn Fe, or Co) with high-spin (HS) and low-spin (LS) multiplicities defined as follows: quintet-singlet for d 4 Mn(III)/Cr(II) and d 6 Co(III)/Fe(II), sextet-doublet for d 5 Fe(III)/Mn(II), and quartet-doublet d 3 Cr(III) and d 7 Co(II) (Table S1, ESI †). Thesemetal centers were studied in octahedral complexes with ligands containing a variety of metal-coordinating atoms and a range of charge states (Table S2, ESI †).
While the structures were obtained from prior work, 93 we briefly describe the protocol to generate them here.All geometry optimizations were carried out using TeraChem 96 v1.9, as automated by molSimplify. 94,97These calculations used the unrestricted B3LYP 98,99 hybrid functional for non-singlet states and restricted B3LYP for singlet states, with the LACVP* basis set, which corresponds to the LANL2DZ 100 effective core potential for the transition metals and heavier elements (I or Br) and the 6-31G* basis for all other elements.In all calculations, level shifting of 1.0 Ha on virtual majority-spin orbitals and 0.1 Ha on virtual minority-spin orbitals was employed.Geometry optimizations used the L-BFGS algorithm in translation rotation internal coordinates (TRIC) 101 to the default tolerances of 4.5 Â 10 À4 Hartree bohr À1 for the maximum gradient and 1 Â 10 À6 Hartree for the change in self-consistent field (SCF) energy between steps.
Average curvature calculations were calculated according to ref. 102: where e N HOMO is the HOMO energy of the N-electron system and e NÀ1 LUMO is the LUMO energy of the NÀ1-electron system.All single-point total energies and orbital energies were obtained using the optimized geometry of the N-electron system.All calculations started from single-point calculations in Tera-Chem using B3LYP/LACVP*, from which we obtained the wavefunction molecular orbital coefficients and used them as an initial guess for the B3LYP single-point calculations in Psi4.Starting from this converged B3LYP result in Psi4, all subsequent calculations for the 22   S3, ESI †).Calculations were carried out with a maximum of 50 SCF iterations and a convergence threshold of 3 Â 10 À5 Ha both in energy and density convergence.This strict limit on SCF cycles was motivated by the fact that all calculations are initiated from a converged electron density from the B3LYP functional, and our goal was to ensure that the same electronic state was converged across all functionals.This could typically be achieved within the 50 step limit, as discussed in prior work. 93he ML model training set for each functional only includes complexes where the HOMO and LUMO errors have the same sign.The HOMO error is defined as DE À e N HOMO , and the LUMO error is defined as e NÀ1 LUMO À DE, where DE = E N À E NÀ1 , and E N is defined as the total energy of a system with N electrons. 104This screening ensures that the average curvature defined in eqn (1) is well defined for all the complexes included in the ML model training process.All structures and data sets used to train machine learning models in this work are available on Zenodo at the DOI: https://doi.org/10.5281/zenodo.7497666.

2b. ML model training
In this work, we followed the same protocol described in previous work 93 for training ML models.We use revised autocorrelations 61 (RACs) as descriptors for all our machine learning models.RACs are sums of products and differences of five atomwise heuristic properties (i.e., topology, identity, electronegativity, covalent radius, and nuclear charge) on the 2D molecular graph.As motivated previously, 61 we applied a maximum bond depth of three and eliminated RACs that were invariant over the mononuclear octahedral transition metal complexes, leaving 151 RACs in total.With the inclusion of oxidation state, spin multiplicity, and total ligand charge, we obtained a feature set of 154 descriptors in total.For both kernel ridge regression (KRR) and artificial neural network (ANN) models, the hyperparameters were selected using Hyperopt 105 with 200 evaluations on a range of hyperparameters, using a random 80%/20% train/test split and 20% of the training data as the validation set.As in prior work, 83,93,94 recursive feature addition (RFA) was carried out on randomforest-ranked features (i.e., RF-RFA) to select the feature set that gives the best-performing KRR model on the basis of mean absolute error.All KRR models were implemented in scikitlearn 106 with a radial basis function kernel.All ANN models were trained with the Adam optimizer 107 up to 2000 epochs, and dropout, 108 batch normalization, 109 and early stopping 110 were applied to avoid over-fitting.
We use the latent space distance as an uncertainty metric to control design space prediction errors, as in prior work 111,112 (Fig. S1, ESI †).The distance is calculated within Keras 113 as the Euclidean distance in latent space between the test point and each training point.The latent distances between each test complex and the 10 nearest training complexes were averaged to determine the uncertainty score for that test complex.For the design space predictions, we chose a conservative latent distance cutoff of 0.2 for all functionals based on the distribution of the mean latent distances of the different ANNs, where each ANN corresponds to a different functional (Fig. S2, ESI †).All machine learning models and data sets used to train machine learning models in this work are available on Zenodo at the DOI: https://doi.org/10.5281/zenodo.7497666.
For feature analysis of the ANN models, we used the SHAP (SHapley Additive exPlanations) 114 Python code.We used 20% of the training set as the background data to initialize the KernelExplainer method.We then obtained SHAP values over the test set.For each model, we averaged over the absolute SHAP values of the test set for the different features to determine a SHAP value for each feature.Each feature was then averaged over the different functionals in a functional group, and we selected the 15 features with the highest mean values as important.

3a. Data set analysis
We first analyze the calculated curvature values in our dataset.While the curvature values are symmetric around a mean for some functionals, the distributions are less apparently normal than in prior work on organic molecules. 91Nevertheless, expected trends hold, with the highest values corresponding to curvatures ranging from 1 to 8 eV for GGA functionals (i.e., PBE, BP86 and BLYP).Increasing the HF exchange fraction results in a decrease of the mean of the distribution (Table S4, ESI †).This can be observed when comparing functionals that differ mostly by the fraction of HF exchange included in the functional (Fig. 1).There is a clear linear decrease in the mean curvature in the PBE family of functionals, that is, PBE, PBE0 and PBE-DH, as the HF fraction increases from 0.0 to 0.5, although the standard deviation and range of the distribution does not shift monotonically with HF fraction (Fig. 1).While the average of our PBE0 data is comparable to what has been reported for a set of small organic molecules, 91 the range and spread of the data is larger here owing to the more diverse behavior of transition metals and their capacity to support multiple spin states.
The same behavior with HF exchange fraction can be seen for the M06-based family, M06-L, M06 and M06-2X, which exhibit similar changes in curvature values to those seen in the PBE family as a function of HF exchange fraction (Fig. 1).The rates of decrease, measured as the curvature as a function of the HF exchange fraction (HFX), are comparable between the two families, with slopes of À6.6 eV per HFX and À5.7 eV per HFX for PBE and M06 families respectively (Fig. 1).Extrapolating these values suggests mean curvatures would be close to zero at around 66% HF exchange, but, as evident from the distributions, selecting this HF exchange fraction would also lead to a long tail of overcorrected (i.e., negative curvature) compounds (Fig. 1).This slope is not universal for all functional families; the mean curvature of dispersion-corrected functionals, DSD-BLYP-D3BJ, DSD-PBEB95-D3BJ and DSD-PBEP86-D3BJ, decreases linearly from À0.5 to À1.5 eV as the HF fraction in the xc functional increases, with a slope five times higher (Fig. S3, ESI †).The reason for this stronger variation with modest changes in HF exchange is likely due to the fact that the double hybrids vary much more significantly in other components of the functional, i.e., the form of the pure DFT exchange and correlation functional components.
For the two double hybrids that both employ PBE GGA exchange (i.e., DSD-PBEB95-D3BJ with 69% exchange and DSD-PBEP86-D3BJ with 71% exchange), we can isolate the effects to both HF exchange variation and the change in the correlation from B95 to P86.While we would expect the average curvature to decrease from DSD-PBEB95-D3BJ to DSD-PBEP86-D3BJ by only around 0.1-0.15eV based on the trends in HF exchange tuning for the PBE and M06 families, the difference in the means is larger (À1.20 eV versus À1.55 eV, Table S4, ESI †).Nevertheless, it is also evident that shifts in the tails of the distribution have a large influence over the computed mean (i.e., with more positive curvatures computed with DSD-PBEB95-D3BJ), likely also exaggerating the shift in the means.As for the difference of DSD-BLYP-D3BJ compared to the other two functionals, the significant differences from the pure DFT exchange correlation functional lead to a mean curvature value of À0.55 eV that is nevertheless around what we would expect from analysis of the PBE and M06 families (i.e., near-zero curvature at around 66% exchange).Thus, the double hybrids are not entirely inconsistent with trends observed for simpler functionals.
We next computed the correlation (i.e., Pearson's r) between the curvature values of the different functionals.The functionals from different rungs of ''Jacob's ladder'' 45 have low correlation, especially in comparison to the correlations of the individual HOMO and LUMO energies among the different functionals (Fig. 2 and Fig. S4, S5, ESI †).The lowest Pearson's r values in the HOMO and LUMO cases are 0.95, whereas the lowest values among the curvature correlations are close to zero.This lower correlation in curvature values is likely because large ranges of HOMO and LUMO values cancel to form smaller overall ranges of curvature values over which correlations will be weaker, while individual HOMO or LUMO values have significant dependence on total charge or size of a transition metal complex, 93,94 leading to higher Pearson's r values.Interestingly, of the double hybrids, PBE0-DH shows the highest correlations of the HOMO and LUMO with GGA and meta-GGA families, while it shows the lowest correlations of the curvature.In contrast, M06-2X, which has the lowest correlations for the HOMO and LUMO with GGA and meta-GGA family members of any meta-GGA hybrid, also has the lowest correlations for the curvature values (Fig. S4 and S5, ESI †).
For the correlation of curvatures between different functionals within the same ''rung'' of Jacob's ladder, the results agree with previous work 93 where it was shown that the GGA and meta-GGA groups were highly correlated among themselves when considering the adiabatic high-spin to low-spin splitting, a property that  was also sensitive to the choice of DFT functional.In the present work, the lowest correlation values observed within the GGA and meta-GGA families for curvature are 0.99 and 0.97, respectively.Despite the strong correlation among pure meta-GGAs, we observe that the meta-GGA hybrids are the least correlated family, likely due to the introduction of a range of HF exchange fractions.The lowest correlation in curvature is observed between M06-2X (54% exchange) and TPSSh (10% exchange).The low correlation of M06-2X with other functionals in the same family is also consistent with our previous observations on frontier orbital energies and spin splittings. 93hile curvature might be expected 91,115 to depend linearly on the degree of HF exchange, our own results indicate that over a wider range of HF exchange fractions, non-linear effects on curvature arise due to a change in the character of the frontier orbitals.In general, the correlation between the highly parameterized M06-2X and the other meta-GGA hybrid functionals is always the lowest, with the highest correlation of 0.74 observed with SCAN0 (25% exchange), signifying a large variability between M06-2X and the rest of the meta-GGA family.Thus, machine learning models trained on each of these individual data sets can be expected to learn distinct curvature-structure mappings.

3b. Training ML models to predict curvature
We trained both kernel ridge regression (KRR) and artificial neural network (ANN) ML models to predict electronic structure properties of transition metal complexes (see Methods).We employ feature selection with RF-RFA when training the KRR models, whereas we apply no feature engineering to the ANN models (see Methods).We trained ML models on the transition metal complex dataset to predict the average curvature as well as the HOMO energy of the N-electron system and the LUMO energy of the NÀ1-electron system (i.e., the two quantities that determine the average curvature).We evaluate accuracy from the value of the scaled mean absolute error (MAE) of these models, which is defined as the absolute difference between the predictions and the exact results divided by the range spanned by the training set values.The HOMO and LUMO energies are predicted equally well by KRR and ANN models, with scaled MAEs of 0.013 and 0.010 for KRR and ANN models respectively for HOMO energy predictions and scaled MAEs of 0.013 and 0.012 for LUMO energy predictions averaged over all functionals (Fig. S6, ESI †).These errors are consistent with results obtained previously on similar datasets of transition metal complexes. 83In contrast to the HOMO or LUMO models, we observe that the average curvature is consistently better predicted by the ANN models than the KRR models for all functionals (Fig. 3).However, the scaled errors are consistently higher than those for the orbital energies, with a mean scaled MAE of 0.044 for ANN predictions and 0.062 for KRR models across all functionals.This increase in scaled MAE is expected because the range of curvature values is much smaller than the range of individual HOMO or LUMO values.Considering differences in data set size and diversity, errors previously observed for ML models trained to predict QM7 curvature values are largely comparable as well. 91he most significant difference of the ANN models relative to the KRR models is seen for double-hybrid functionals, where the ANN performs around 30-45% better (i.e., lower scaled MAE) than the KRR model (44% in B2GP-PLYP).While this discrepancy is large, this observation is consistent with previous observations for the closely related task of HOMO-LUMO gap prediction, where an ANN outperformed the KRR model by 16-24% depending on the dataset. 83In addition, ANN models were shown to perform better when predicting the adiabatic spinsplitting energy, where the KRR MAE was shown to be higher than that of ANN for most functionals. 93Both spin splitting and HOMO-LUMO gap are similar to the curvature in that they are defined as energy differences, leading to smaller ranges of values than the individual frontier HOMO or LUMO energies.
It is worthwhile to note that the training set of each functional is unique and the size of each data set could affect the results.After pruning the original dataset to eliminate ambiguous curvature values (see Methods), the remaining datasets are not the same size for all functionals (Fig. S7, ESI †).The fewest ambiguous curvature values are present for pure (i.e., GGA or meta-GGA) functionals because they tend to have strongly positive HOMO and LUMO errors (i.e., positive curvatures) so these data sets are larger than the sets for functionals that incorporate HF exchange.In practice, this means that hybrid GGA and hybrid meta-GGA functionals each have data sets with approximately 550 items on average, and double-hybrid functionals are, in most cases, trained on the smallest datasets, with approximately 280 points on average in each dataset.While some points in the work from which we curate the present dataset did not converge, 93 our datasets are primarily reduced due to the presence of opposing HOMO and LUMO sign errors that make estimating the average curvature from eqn (1) not suitable.Somewhat surprisingly, we observe that ANNs outperform KRR models most for the double hybrids with the smallest data sets, and we observe relatively comparable performance of KRR and ANN models for the larger GGA dataset sizes where both perform well.Nevertheless, holding the model fixed and comparing functionals, we do observe that models trained on the larger GGA datasets have consistently lower MAEs than those trained on double hybrids.
To attempt to assess the effect of the size of the training set on the results, we trained models for each functional using both KRR and ANN on a dataset that contains compounds with valid curvature values for all functionals.This dataset is necessarily very small, with only 64 complexes, in comparison to the individual DFA datasets because different functionals lead to distinct HOMO and LUMO errors (Fig. S7, ESI †).The 64complex dataset is particularly small because it is the subset of complexes for which a meaningfully valued curvature is obtained for all 23 functionals.The MAEs are much higher than those of the original DFA-specific datasets as a result (Fig. S8, ESI †).For the models trained on substantially less data, the ANN models predicting the curvature for doublehybrid functionals are no longer clearly superior to the KRR models.Additionally, the BP86 ANN model has a scaled MAE more than twice as large as that of KRR when trained on the small dataset whereas it had a 40% lower error over the original dataset (Fig. 3).Taken together with our earlier observations on more moderate dataset size ranges of around 300-600 points, it is worth noting that once the dataset size is not much larger than the number of metals, oxidation, and spin states present in the set, the learning task is much more difficult.However, as long as there are multiple examples of each metal with a range of ligands, ANNs are likely to remain preferable to KRR models, even when the dataset size is modest (ca.300 points).We thus conclude that while the variability (in terms of size and composition) of the different sets may be significant when comparing the different models, working with the largest available datasets is likely the optimal way to achieve good model accuracy So, in the rest of this work we will refer to models trained on the original larger datasets.
To evaluate the internal consistency of the KRR and ANN models, we subtract the individual ML model predictions of the HOMO energy of the N-electron systems and LUMO energy of the NÀ1-electron systems and compare this value to the direct curvature predictions from ML models.Despite the fact that individual frontier orbital energy predictions from the KRR and ANN models were of comparable accuracy, indirect calculations of the curvature from predicted HOMO and LUMO values by the ANN are much more consistent with the direct predictions (i.e., R 2 close to 1) compared to the KRR (Fig. 4).The differences between the direct and indirect approaches are lowest in GGA functionals and are most apparent in hybrid-GGA functionals.
A closer look at the distributions of these predictions shows that a specific complex of Co with two 4,4 0 -dimethyl-2,2 0bipyridine ligands and two benzyl iscocyanide ligands is consistently an outlier in the KRR predictions and is not well predicted by these models (Fig. 4).For multiple ML models trained on different functionals, the predicted HOMO-LUMO difference of this compound is significantly lower than the curvature.Even if we disregard these outliers, our observations still encourage the use of the ANN over the KRR model for performance.So for the rest of this work, we will therefore focus on the ANN models only.

3c. Feature importance analysis
By analyzing the most important features in ML model predictions, we can achieve interpretability of otherwise black-box structure-property mappings.We employ SHAP analysis (see Methods) to interpret our ANN models.SHAP uses game theory to calculate the impact of a feature on the resulting target value. 114The average value of the impact of a feature can be used as a measure of its importance.SHAP is model-agnostic (i.e., and thus can be used with the ANNs trained here) and performs a perturbation around the points of the training data to calculate the impact on the model.We selected SHAP because we can directly interpret the ANN models, which perform better than the KRR models on our learning task, as opposed to recursive feature addition into KRR models that we had carried out in prior work, 93 while still providing insights into which RAC features are the most important for model prediction.From SHAP analysis, we observe that the metal center is most important for predicting curvature, followed by more distant features (i.e., three bond paths away) (Fig. 5 and Fig. S9, ESI †).
It is worth noting that we attribute the oxidation state of the complex to the metal center because our ligands are redox innocent.This oxidation state is a universally large and expected contributor to the prediction of the curvature because it can be expected to alter the global ionization potential and electron affinity as well as the extent to which a particular DFT functional deviates from ideality.If this feature were instead attributed as a global feature, then the contributions threebond paths away would dominate for most functionals.Nevertheless, the nature of the most important metal-centered features in curvature models varies considerably between different families of functionals (Fig. 5 and Fig. S9, ESI †).For functionals with higher HF exchange fractions (e.g., double hybrids), spin and oxidation state become dominant metalcentered features, whereas for semi-local functionals oxidation state has an overriding effect in comparison to spin (Fig. 5).
To understand the origin of this difference, we obtained the difference in curvature between high-and low-spin states for four representative complexes, each with a different metal center: Cr(III)(pyridine) 5 (methylisocyanide), Mn(II)(furan) 4 (H 2 O)-(CO), Fe(III)(furan) 5 (methylisocyanide), and Co(II)(methylisocyanide) 4 (H 2 O)(pyridine).These selected complexes have a mix of strong-field (i.e., CO or methylisocyanide) and weak-field (i.e., H 2 O, furan) ligands.We focus on differences in results among GGA, rangeseparated-hybrid and double-hybrid families because they have the greatest differences in the relative importance of spin state for curvature prediction according to SHAP analysis (Fig. 6 and Fig. S10, ESI †).In these examples, the difference in curvature between spin states with GGA functionals are close to zero and no larger than around 0.3 eV for any of the GGAs considered (i.e., PBE, BLYP, or BP86, Fig. 6 and Fig. S10, ESI †).In comparison, both double hybrids and range-separated hybrids have curvatures that differ in magnitude by around 1 eV or more between spin states (Fig. 6 and Fig. S10, ESI †).
This dependence on spin present in the RS-and doublehybrid ML models could be attributed to differences in how the HF term in the exchange-correlation functional reduces the Coulomb term specifically in the cases of opposite spin projection.However, it is surprising then that global hybrids lack this behavior.After the metal features, atoms three bonds away are the most significant for curvature predictions (Fig. 5).This suggests significant size-dependence in the curvature.This trend holds for all functionals and is strongest for GGAs.This agrees with our expectation that charge addition in larger complexes delocalizes over more of the complex, resulting in lower apparent curvature.Despite some differences in feature importance in models trained on curvatures from differing functionals, the most important features in the individual HOMO and LUMO are more consistent, in agreement with RF-RFA KRR feature importance observations from prior work 93 (Fig. 7 and Fig. S11, S12,  ESI †).Nevertheless, limited differences are apparent.For both HOMO and LUMO SHAP analysis, the GGAs show more overall dependence on the chemical structure, whereas double hybrids tend to more strongly emphasize distant (i.e., three bond paths or more) features.Both HOMO and LUMO models depend This journal is © the Owner Societies 2023 significantly on more distant (i.e., three bonds or more) atoms, indicating the size dependence of the HOMO and LUMO energies.This observation is in agreement with previous results 83 that showed strong global dependence and a lower metal-local dependence of the HOMO energy in RF-RFA KRR feature selection.Both the HOMO and LUMO features, however, show overriding dependence on the oxidation state in terms of metal-centered features.The SHAP analysis does not contradict observations from RF-RFA KRR, but since SHAP analysis not only selects the important features but also assigns them a weight with a SHAP value, it emphasizes that while oxidation state is the only metal-related feature selected, it has high importance.
Unlike the case of curvature, in the HOMO of the N-electron system models, spin is not one of the important features in any of the functional families (Fig. 7).The dependence of curvature on the spin state instead appears to stem from the LUMO of the NÀ1-electron system, which shows spin dependence only in the RS-hybrid and double-hybrid functionals.Both the HOMO and LUMO model predictions also depend on ligand charge, which was not one of the significant features according to SHAP analysis of the curvature models, emphasizing stronger dependence on oxidation state of the individual HOMO and LUMO values than for their difference (i.e., in curvature).Overall, despite curvature being derived from HOMO and LUMO properties inherently, it shows distinct and functional-sensitive property importance, as judged by SHAP analysis.This suggests that errors can be attributed to different sources for different functional families (e.g., metal spin in RS-and doublehybrids).

3d. Large-scale exploration of curvature in transition metal chemical space
To obtain a broader understanding of the relationship between the DFT curvature and chemical properties, we apply the trained HOMO, LUMO and curvature models to a space of 187 200 hypothetical transition metal complexes.As described in previous work 83 the space of theoretical complexes contains HS and LS M(II/III) (M = Cr, Mn, Fe, or Co) centers in complex with 36 unique ligands derived from the original data set. 83hus, this set is interpolative in nature but contains many complexes not used in the training data of the model.These new complexes have properties (e.g., size and charge) distinct from those of the data set we used to train our ANN models, specifically because different ligand combinations in this larger set were absent from the training data.To ensure suitability of ANN model predictions on this data set, we apply the latent distance criteria 111 as an uncertainty quantification (UQ) metric (see Methods) and retain only complexes that are below the latent distance threshold for all functionals.This UQ cutoff leaves us with 155 295 complexes, but we lack ground-truth performance to confirm that this UQ step improves predictions over the large set.Instead, we compared the ANN prediction of the curvature from the indirect method (i.e., the difference between the HOMO of the N-electron system and the LUMO of the NÀ1-electron system) to the directly predicted curvature and assess agreement (i.e., the R 2 ) between the two predictions.Although the R 2 value is already good before filtering, it improves for most functionals after removing high-uncertainty data (Fig. S13, ESI †).This observation provides support for the applicability of our models to the UQ-curated design space.
Next, we analyze the molecules that have the lowest absolute curvature predictions for the ANNs trained on different functionals.We focus on complexes that are in the lowest 20% (31 059 complexes) of the distribution of absolute curvatures.We make observations grouped over the GGA, meta-GGA, RShybrid and double-hybrid families.The lowest 20% of the GGA and meta-GGA curvatures ranges between 1-3 eV, and the most common ligands in this subset are 4-tert-butylphenyl isocyanide, thiopyridine, and cyanopyridine, which are relatively large ligands (Fig. S14, ESI †).For the RS-hybrid and double-hybrid families on the other hand, the lowest absolute curvatures range between À0.3 and 0.3 eV.The most dominant ligands in this low-curvature region in the double-hybrid family are again 4-tert-butylphenyl isocyanide and thiopyridine as in the GGA families but in addition, some smaller ligands (such as NCO À and SH 2 ) also appear frequently (Fig. S14, ESI †).For the RS-hybrids, the most common ligands in low-curvature complexes are smaller weak-field ligands such as SH 2 , OH À and F À (Fig. S14, ESI †).Thus, large ligands do not necessarily lead to zero curvature in RS-hybrids, and weak-field ligands lead to zero curvatures only for some families of functionals.For the RS-hybrids, large ligands generally lead to overcorrection and negative curvatures.From a metal-center perspective, no unifying picture of curvature is present across all families of functionals.While Co is the primary metal observed in lowcurvature complexes for GGAs and meta-GGAs and Mn is the least represented, the reverse is true for RS-hybrids (Fig. S15, ESI †).Hybrid and non-hybrid functionals also differ in the spin state that dominates the low-curvature regime, where hybrid functionals show a distinct preference for low curvatures for complexes in HS states, whereas non-hybrids favor HS and LS states equally (Fig. S16, ESI †).This observation is consistent with our earlier model feature analysis.In terms of oxidation state, only RS-hybrids have equal numbers of +2 and +3 oxidation state compounds with low curvature, whereas other families prefer +3 oxidation states (Fig. S17, ESI †).This observation is somewhat surprising because the ionization potential increases with increasing oxidation state and curvature values could be expected to increase as a result.
One of the primary use cases for ANNs trained to predict curvature is to enable identification of one or more appropriate functionals to use for property prediction while recovering piecewise linearity.We thus verify design space predictions on representative complexes with each of the four metals studied in this work (i.e., HS Fe, Mn, and Co as well as an LS Cr complex).None of the selected complexes were present in the original training data but due to the way the design space was constructed (i.e., from new combinations of the ligands and metals that are present in the training data), they are relatively similar to training data and thus model performance should be comparable to test set performance.For each of these complexes, the ANN models predict a value close to zero curvature for at least four functionals for each of these complexes.Typically, the functionals selected with curvatures close to zero (i.e., within the mean test set error of 0.4 eV) are double hybrids or range-separated hybrids.We then carried out DFT geometry optimizations and single-point energy calculations (see Methods) to verify these ANN predictions.The differences between the calculated and predicted curvature are typically less than 0.5 eV (Fig. 8).For example, the ANN-predicted curvatures for HS Mn(III)(methylamine) 6 and Cr(II)(formaldehyde) 4 (methylamine)(furan) are within 0.2 eV of the calculated values for the LRC-oPBEh range-separated hybrid and the double hybrids (i.e., B2GP-PLYP and DSD-PBEB95-D3BJ for the Mn complex, DSD-BLYP-D3BJ and DSD-PBEP86-D3BJ for the Cr complex).Neither of these complexes were present in the global (i.e., all DFA) training or test data, with the closest examples to HS Mn(III)(methylamine) 6 being HS Mn(III)(porphyrin)(methylamine) 2 and the closest example to LS Cr(II)(formaldehyde) 4 (methylamine) (furan) being HS Cr(II)(formaldehyde) 4 (ammonia) 2 . 116An exception to this good performance is Fe(III)(furan) 4 (ammonia) 2 where the ANN trained on LRC-oPBEh predicted a curvature B1 eV smaller than the calculated value, although the performance of other functionals (e.g., B2GP-PLYP) is closer to the test error (Fig. S18, ESI †).The closest complexes to HS Fe(III)(furan) 4 (ammonia) 2 in the train or test data were other complexes of furan in combination with stronger-field carbonyl or weaker-field water, and the only complexes in the LRC-oPBEh training or test data were a homoleptic furan complex and a heteroleptic furan-containing complex with a single methylisocyanide ligand. 116Even in cases where errors exceeded the test set MAE, we were still able to identify one or more functionals with curvature close to zero, suggesting the possible application of this approach in the chemical discovery of transition metal complexes.

Conclusions
Using a dataset of nearly one thousand transition metal complexes, we computed and analyzed the average curvature (i.e., deviation from piecewise linearity) for 23 density functional approximations spanning multiple rungs of ''Jacob's ladder''.Over this set, we observed a wide range of curvature values that nevertheless followed expected qualitative trends, such as the high positive curvatures in semi-local GGA and meta-GGA functionals in the range of 1-8 eV that was lowered by around 5-6 eV per unit of HF exchange within a family of functionals.While we observed correlations for most functionals within the same ''rung'', we also observed reduced correlation among meta-GGA hybrids, especially for M06-2X and TPSSh, likely as a consequence of both the correlation and exchange terms differing substantially.We then trained artificial neural networks and kernel ridge regression machine learning models to predict both curvature and the associated HOMO and LUMO energies for each of these 23 functionals.We observed superior performance of the ANNs, especially when it came to predicting properties for specific outlier compounds.Using SHAP analysis, we interpreted the ANN model prediction and discovered a specific spin state dependence for range-separated hybrid and double-hybrid functionals that was absent from other functionals, potentially explaining why curvature values are weakly correlated between these and other families of functionals.
Beyond the strong influence of metal oxidation state on curvature, metal-distant features three bonds or more from the metal played the second-most important role.We rationalized the differences in feature importance observed with different functionals here from prior work 93 that used RF-RFA KRR to determine that selected functionals were feature-invariant for the HOMO.We showed that the HOMO level is indeed consistent across functionals, with strong contributions from atoms three bond paths away from the metal or more along with the metal oxidation state.We instead observed that the strong spin-state dependence of curvature for double hybrids arises from differences in the LUMO, where metal spin contributes for double hybrids but not for semi-local functionals.Nevertheless, the majority of features are more consistent across functional families than across properties, still supporting conclusions from prior work 93 that feature analysis is relatively insensitive to functional-dependent errors.
We explored a space of over one hundred thousand hypothetical complexes to uncover chemical trends in the composition of complexes predicted to have zero curvature.We observed that the ligands commonly present in low-curvature complexes depended on the family of the DFA, with large ligands having the lowest curvature values for semi-local GGA functionals but not necessarily for rangeseparated hybrids.Finally, we demonstrated the potential benefit these curvature-predicting ANNs by identifying and validating compounds that were predicted to have near-zero curvature for at least four DFAs.Thus, this approach provides a low-cost means of screening compound spaces and selecting a DFA that will have low curvature for the relevant transition metal complex.We envision this approach being particularly useful in screening compounds for targeted optical gaps where recovering piecewise linearity will be especially beneficial in reducing the computational cost.

Fig. 2
Fig. 2 An upper triangular matrix colored by Pearson's r for pairs of 23 functionals for the average curvature computed over a set of mononuclear octahedral transition metal complexes with Cr, Mn, Fe, or Co centers.The correlations are grouped by functional family from top to bottom or left to right: GGA, meta-GGA, GGA-hybrid, range-separated (RS) hybrid, meta-GGA hybrid, and double hybrid.

Fig. 1
Fig. 1 The curvature distribution (blue violin plots) and its mean value (blue markers) for PBE-(left panel) and M06-based (right panel) functionals as a function of the Hartree-Fock (HF) exchange fraction in their exchange correlation functionals.(left) Markers from left to right (low to higher HF values) correspond to PBE, PBE0 and PBE-DH.(right) Markers from left to right correspond to M06-L, M06 and M06-2X functionals.The dashed lines correspond to a linear fit to the mean values, where the slopes are À6.6 eV per HFX and À5.7 eV per HFX for PBE and M06 based functionals correspondingly.The R 2 of the linear fits is 0.999 in both cases.The vertical bar indicates the full range of the distribution for each functional.

Fig. 4 (
Fig. 4 (left panel) Bar plot of the R 2 value of the direct curvature prediction vs. the indirect prediction obtained by subtracting the prediction of the LUMO of the NÀ1-electron system from the prediction of the HOMO of the N-electron system for 23 different functionals using KRR (red) and ANN (blue) models.(right panel) Examples of the curvature vs. frontier orbital energy difference as predicted using KRR (red markers) and ANN (blue markers) models, for representative GGA-hybrid functionals (B3P86 top figure and PBE0 bottom figure).The structures of the outliers in the KRR distributions are shown explicitly.

Fig. 5
Fig. 5 Stacked bar plot of the fractional weight of 15 features with the highest SHAP values in a curvature prediction model, as a function of the most metal-distal atoms for GGA family (top panel) and the double-hybrid family (bottom panel).Features are grouped by their type: spin (light blue), oxidation state (gray), electronegativity (w, blue), nuclear charge (Z, green), topology or identity (T/I, red), covalent radius (S, orange) and ligand charge (L, purple).Error bars reflect the standard deviation across the set of DFAs within each functional family.

Fig. 6
Fig. 6 The magnitude of the difference between the curvature in the HS state and the curvature in LS state of Cr(III) (pyridine) 5 (methylisocyanide) (top panel) and Co(II)(methylisocyanide) 4 (H 2 O)(pyridine) (bottom panel) for a series of GGA, RS-hybrid and double-hybrid functionals.

Fig. 7
Fig. 7 Stacked bar plot of the fractional weights of 15 features with the highest SHAP values in a prediction model of the HOMO energy of the N-electron system (left) and LUMO energy of the NÀ1-electron system (right), as a function of the most metal-distal atoms for the GGA family (top panel) and the double-hybrid family (bottom panel).Features are grouped by their type: spin (light blue), oxidation state (gray), electronegativity (w, blue), nuclear charge (Z, green), topology or identity (T/I, red), covalent radius (S, orange) and ligand charge (L, purple).Error bars reflect the standard deviation across the set of DFAs within each functional family.
other functionals in this work