DOI:
10.1039/C7SC01247K
(Edge Article)
Chem. Sci., 2017,
8, 51375152
Predicting electronic structure properties of transition metal complexes with neural networks†
Received
19th March 2017
, Accepted 9th May 2017
First published on 17th May 2017
Highthroughput computational screening has emerged as a critical component of materials discovery. Direct density functional theory (DFT) simulation of inorganic materials and molecular transition metal complexes is often used to describe subtle trends in inorganic bonding and spinstate ordering, but these calculations are computationally costly and properties are sensitive to the exchange–correlation functional employed. To begin to overcome these challenges, we trained artificial neural networks (ANNs) to predict quantummechanicallyderived properties, including spinstate ordering, sensitivity to Hartree–Fock exchange, and spinstate specific bond lengths in transition metal complexes. Our ANN is trained on a small set of inorganicchemistryappropriate empirical inputs that are both maximally transferable and do not require precise threedimensional structural information for prediction. Using these descriptors, our ANN predicts spinstate splittings of singlesite transition metal complexes (i.e., Cr–Ni) at arbitrary amounts of Hartree–Fock exchange to within 3 kcal mol^{−1} accuracy of DFT calculations. Our exchangesensitivity ANN enables improved predictions on a diverse test set of experimentallycharacterized transition metal complexes by extrapolation from semilocal DFT to hybrid DFT. The ANN also outperforms other machine learning models (i.e., support vector regression and kernel ridge regression), demonstrating particularly improved performance in transferability, as measured by prediction errors on the diverse test set. We establish the value of new uncertainty quantification tools to estimate ANN prediction uncertainty in computational chemistry, and we provide additional heuristics for identification of when a compound of interest is likely to be poorly predicted by the ANN. The ANNs developed in this work provide a strategy for screening transition metal complexes both with direct ANN prediction and with improved structure generation for validation with first principles simulation.
1. Introduction
Highthroughput computational screening has become a leading component of the workflow for identifying new molecules,^{1,2} catalysts,^{3} and materials.^{4} Firstprinciples simulation remains critical to many screening and discovery studies, but relatively high computational cost of direct simulation limits exploration of chemical space to a small fraction of feasible compounds.^{5,6} In order to accelerate discovery, lower levels of theory, including machinelearning models, have emerged as alternate approaches for efficient evaluation of new candidate materials.^{7} Artificial neural networks (ANNs) have recently found wide application in the computational chemistry community.^{8–10} Machine learning approaches were initially appreciated for their flexibility to fit potential energy surfaces and thus force field models.^{10–17} Broader applications have recently been explored, including in exchange–correlation functional development,^{8,18} general solutions to the Schrödinger equation,^{19} orbital free density functional theory,^{20,21} many body expansions,^{22} acceleration of dynamics,^{23–25} bandgap prediction,^{26,27} and molecular^{1,2} or heterogeneous catalyst^{28} and materials^{29–32} discovery, to name a few.
Essential challenges for ANNs to replace direct calculation by firstprinciples methods include the appropriate determination of broadly applicable descriptors that enable the use of the ANN flexibly beyond molecules in the training set, e.g. for larger molecules or for those with diverse chemistry. Indeed, the most successful applications of ANNs at this time beyond proofofconcept demonstrations have been in the development of force fields for welldefined compositions, e.g. of water.^{33,34} Within organic chemistry, structural descriptors such as a Coulomb matrix^{35} or local descriptions of the chemical environment and bonding^{36,37} have been useful to enable predictions of energetics as long as a relatively narrow range of compositions is considered (e.g., C, H, N, O compounds). These observations are consistent with previous successes in cheminformatics for evaluating molecular similarity,^{38} force field development,^{39} quantitative structure–activity relationships,^{40} and group additivity^{41} theories. For transition metal complexes, few force fields have been established that can capture a full range of inorganic chemical bonding,^{42} and the spinstate and coordinationenvironmentdependence of bonding^{43} suggests that more careful development of descriptors is required to broadly predict properties of openshell transition metal complexes. Similarly, descriptors that worked well for organic molecules have been demonstrated to not be suitable in inorganic crystalline materials.^{44} It is wellknown^{45–47} that there is a strong relationship between sensitivity of electronic properties (e.g., spinstate splitting) and the direct ligand–atom and ligand field strength^{48,49} in transitionmetal complexes. Since ligands with the same direct metalbonding atom can have substantially different ligandfield strengths (e.g., C for both weaker field CH_{3}CN versus strongfield CO), whereas distant substitutions (e.g., tetraphenylporphyrin vs. base porphine) will have a limited effect, a transitionmetal complex descriptor set that carefully balances metalproximal and metaldistant descriptors is needed.
Within transition metal chemistry and correlated, inorganic materials, a second concern arises for the development of ANN predictions of firstprinciples properties. Although efficient correlated wavefunction theory methods (e.g., MP2) may be straightforwardly applied to small organic molecules, such methods are not appropriate for transition metal complexes where best practices remain an open question.^{50} Although promising avenues for ANNs include the mapping of lowerlevel theory results, e.g. from semiempirical theory,^{51} to a higherlevel one, as has been demonstrated on atomization energies^{52} and more recently reaction barriers,^{53} suitable levels of theory for extrapolation are less clear in transition metal chemistry.
Additionally, uncertainty remains about the amount of exact (Hartree–Fock, HF) exchange to include in study of transition metal complexes, with recommendations ranging from no exchange, despite disproportionate delocalization errors in approximate DFT on transition metal complexes,^{48,54,55} to alternately low^{56–58} or high^{59} amounts of exact exchange in a systemdependent manner. Indeed, there has been much interest recently in quantifying uncertainty with respect to functional choice in energetic predictions,^{60–62} including through evaluation of sensitivity of predictions with respect to inclusion of exact exchange.^{45,59} Spinstate splitting is particularly sensitive to exchange fraction,^{45–47} making it a representative quantity for which it is useful to obtain both a direct value and its sensitivity to varying the exchange fraction. Thus, a machinelearning model that predicts spinstate ordering across exchange values will be useful for translating literature predictions or providing sensitivity measures on computed data.
Overall, a demonstration of ANNs in inorganic chemistry, e.g. for efficient discovery of new spincrossover complexes,^{63,64} for dyesensitizers in solar cells,^{65} or for identification of reactivity of openshell catalysts^{66}via rapid evaluation of spinstate ordering should satisfy two criteria: (i) contain flexible descriptors that balance metalproximal and metaldistant features and (ii) be able to predict spinstate ordering across exchange–correlation mixing. In this work, we make progress toward both of these aims, harnessing cheminformaticsinspired transition metalcomplex structure generation tools^{67} and established structure–functional sensitivity relationships in transition metal complexes^{45,59} to train ANNs for transition metal complex property prediction.
The outline of the rest of this work is as follows. In Section 2 (Methods), we review the computational details of data set generation, we discuss our variable selection procedure, and we review details of the artificial neural network trained. In Section 3, we provide the Results and discussion on the trained neural networks for spinstate ordering, spinstate exchange sensitivity, and bondlength prediction on both trainingsetrepresentative complexes and diverse experimental complexes. Finally, in Section 4, we provide our Conclusions.
2. Methods
2.1 Test set construction and simulation details
Data set construction.
Our training set consists of octahedral complexes of firstrow transition metals in common oxidation states: Cr^{2+/3+}, Mn^{2+/3+}, Fe^{2+/3+}, Co^{2+/3+}, and Ni^{2+}. Highspin (H) and lowspin (L) multiplicities were selected for each metal from the ground, highspin state of the isolated atom and the higherenergy, lowestspin state within 5 eV that had a consistent dorbital occupation for both states, as obtained from the National Institute of Standards and Technology atomic spectra database.^{68} The selected H–L states were: tripletsinglet for Ni^{2+}, quartetdoublet for Co^{2+} and Cr^{3+}, quintetsinglet for Fe^{2+} and Co^{3+}, quintettriplet for Cr^{2+} and Mn^{3+} (due to the fact that there is no data available for Mn^{3+} singlets^{68}), and sextetdoublet for Mn^{2+} and Fe^{3+}.
A set of common ligands in inorganic chemistry was chosen for variability in denticity, rigidity, and size (nine monodentate, six bidentate, and one tetradentate in Fig. 1 and ESI Table S1†). These ligands span the spectrochemical series from weakfield chloride (1, Cl^{−}) to strongfield carbonyl (6, CO) along with representative intermediatefield ligands and connecting atoms, including S (2, SCN^{−}), N (e.g., 9, NH_{3}), and O (e.g., 14, acac). All possible homoleptic structures with all metals/oxidation states were generated from ten of these ligands (90 molecules) using the molSimplify toolkit^{67} (ESI Table S2†). Additional heteroleptic complexes (114 molecules) were generated using molSimplify with one mono or bidentate axial ligand type (L_{ax}) and an equatorial ligand type (L_{eq}) of compatible denticity (ligands shown in Fig. 1, schematic shown in Fig. 2, geometries provided in the ESI†). We also selected 35 molecules from the Cambridge Structural Database^{69} (ESI Table S3†).

 Fig. 1 Set of ligands used to generate the transition metal complex data set. Ligands are numbered 1–16 and colored according to the atom type that coordinates with the metal, with chlorine in green, carbon in gray, sulfur in orange, nitrogen in blue, and oxygen in red. Purple lines indicate the bonds formed to metalcoordinating atoms in the ligand complexes. Abbreviations for each ligand used in the text are also shown. Full chemical names are provided in ESI Table S3.†  

 Fig. 2 Schematic diagram of descriptors (left) as inputs to the ANN (right), along with hidden layers, and output (e.g., spinstate splittings). The additive bias term in each node is omitted.  
Firstprinciples geometry optimizations.
DFT gasphase geometry optimizations were carried out using TeraChem.^{70,71} DFT calculations employ the B3LYP hybrid functional^{72–74} with 20% Hartree–Fock (HF) exchange (a_{HF} = 0.20) and a variant^{45} (a_{HF} = 0.00 to 0.30 in 0.05 increments) that holds the semilocal DFT portion of exchange in a constant ratio. We calculate and predict spinstate splitting sensitivities HF exchange, , as approximated from linear fits, in units of kcal per mol per HFX, where 1 HFX corresponds to varying from 0% to 100% HF exchange. B3LYP^{72–74} is chosen here due to its widespread use and our prior experience^{45} with tuning it to study HF exchange sensitivity, where we observed^{45} similar behavior with other GGA hybrids, e.g. PBE0, as long as the same HF exchange fraction was compared.
The composite basis set used consists of the LANL2DZ effective core potential^{75} for transition metals and the 631G* basis for the remaining atoms. All calculations are spinunrestricted with virtual and openshell orbitals levelshifted^{76} by 1.0 and 0.1 eV, respectively, to aid selfconsistent field (SCF) convergence to an unrestricted solution.
For all training and test case geometry optimizations, default tolerances of 10^{−6} hartree for SCF energy changes between steps and a maximum gradient of 4.5 × 10^{−4} hartree per bohr were employed, as implemented in the DLFIND interface^{77} with TeraChem (ESI Table S4†). Entropic and solvent effects that enable comparison to experimental spinstate splittings have been omitted, and we instead evaluate the DFT adiabatic electronic spin state splitting, as in previous work because our goal is to predict DFT properties and sensitivity to functional choice.^{45,78} In highthroughput screening efforts ongoing in our lab, entropic and solvent effects that influence catalytic and redox properties will be considered explicitly.
For each molecular structure (90 homoleptic, 114 heteroleptic) 14 geometry optimizations were carried out at 7 exchange fractions (from 0.00 to 0.30) and in high or lowspin, for a theoretical maximum of 2856 geometry optimizations. In practice, 166 structures were excluded due to (i) large spin contamination, as defined by an expectation value of 〈Ŝ^{2}〉 that deviated more than 1 μ_{B} from the exact value (<1%, 26 of 2856, see ESI Table S5†), (ii) dissociation in one or both spin states, especially of negatively charged ligands, leading to loss of octahedral coordination (4%, 126 of 2856, see ESI Table S6†), or (iii) challenges associated with obtaining a stable minimized geometry (<1%, 14 of 2856, see ESI Table S2†). Eliminating these cases produced a final data set of 2690 geometry optimizations (structures and energetics provided in ESI, as outlined in ESI Text S1†). Although these excluded cases are a fraction of our original data set, they highlight considerations for application of the ANN in highthroughput screening: highly negatively charged complexes should be avoided, and single point DFT calculations should be used to confirm that a highfitness complex does not suffer from large 〈Ŝ^{2}〉 deviations.
2.2 Descriptor selection
Highthroughput screening of transitionmetal complex properties with direct prediction from an ANN requires mapping of an empirical feature space that represents the complex, χ, to quantummechanical predictions. This feature space should be balanced to avoid (i) too few descriptors with insufficient predictive capability or (ii) too many descriptors that lead to overfitting of the ANN. Molecular descriptors^{79} that have been used for parameterizing chemical space include: atomic composition, electronegativity,^{37} formal charges, and representations of the geometric structure. This last class of descriptors may be divided into those that depend either on 3D structural information^{13,20,80–82} or on graphtheoretic connectivity maps^{83} (e.g., the Randić,^{84} Wiener shape,^{85} or Kier^{86} indices). Graphtheoretic methods are preferable to 3D structural information to avoid sensitivity to translation/rotation or molecule size,^{87} though we note that subsystem descriptors^{13,82,88} and elementspecific pairwise potentials^{81,87} have been employed successfully to overcome some challenges. A secondary reason to avoid use of 3D structural information is the implicit requirement of equilibrium geometries obtained from a geometry optimization, which are readily achieved with semiempirical methods on small organic molecules^{83} but would be prohibitive and errorprone for transition metal complexes.
To evaluate candidate descriptor sets, we use L1regularized, least absolute shrinkage and selection operator (LASSO) linear leastsquares regression,^{89} as implemented in the glmnet^{90} package in R3.2.5.^{91} LASSO is used to reduce overfitting, force the coefficients of the leastpowerful indicators to zero, and avoid monotonic decrease of model error as feature space dimensionality increases. Given observed input–output pairs (x_{i}, y_{i}) for i = 1, …, n with x ∈ χ ⊂ _{i}^{m} and λ ∈ , the output is modeled as:
for
β,
β_{0} ∈
^{m} ×
, where:

 (2) 
The parameter λ is selected by tenfold crossvalidation with values typically between 10^{−1} and 10^{−6}. Our descriptors include both continuous variables that are normalized and discrete variables that are described by zeroone binary coding (ESI Table S7†). Metal identity is best described by a set of discrete variables: 4 binary variables are chosen to represent Cr, Mn, Fe, and Ni, and Co corresponds to the case where all 4 variables are zero. This leads to a higher number of overall variables than for continuous descriptors (see Table 1).
Table 1 Comparison of variable sets by rootmeansquared errors (RMSE) after regularization in ΔE_{H–L} and prediction along with number of discrete variables (with all binary levels of the discrete variables counted in parentheses) and the number of continuous variables
Set 
RMSE (ΔE_{H–L}) (kcal mol^{−1}) 
RMSE (kcal per mol per HFX) 
Discrete variables 
Continuous variables 
a 
14.6 
20.6 
3 (37) 
6 
b 
15.1 
21.7 
3 (15) 
8 
c 
15.2 
21.2 
3 (15) 
11 
d 
15.1 
21.3 
3 (15) 
10 
e 
14.9 
21.1 
3 (15) 
12 
f 
15.1 
23.5 
3 (15) 
10 
g 
14.9 
21.3 
3 (15) 
12 
Based on our previous studies of transition metal complexes,^{45,49} we expect that spinstate ordering is predominantly determined by the immediate chemical environment around the metal center, potentially enabling predictive descriptors that are widely transferable across a range of molecule sizes. We compare 7 descriptor sets on the data and select the subset of descriptors that give the best simultaneous predictive performance for spinstate splitting, ΔE_{H–L}, and its sensitivity with respect to HF exchange variation, , as indicated by the prediction root mean squared error (RMSE):

 (3) 
When two variable sets perform comparably, we select the variable set that will enable broader application of the ANN. All sets include the metal identity as a discrete variable and metal oxidation state, ligand formal charge, and ligand denticity as continuous variables (Fig. 3, some descriptors shown in Fig. 2). Set a represents our most specific model, where we explicitly code the full axial or equatorial ligand identity as a discrete variable, limiting the application of the model but producing one of the lowest RMSEs for ΔE_{H–L} and (Table 1). Elimination of ligand identity in favor of ligand connecting atom elemental identity and total number of atoms in set b increases ΔE_{H–L} RMSE slightly and decreases RMSE (see Table 1).

 Fig. 3 Summary of variables chosen for each set a through g. Employed variables are indicated in shaded gray and grouped by whether they are assessed on the whole complex (complexbased) or on each individual axial or equatorial ligand (ligandbased). Δχ is the difference in Pauling electronegativity between the ligand connecting atom and all atoms bonded to it, and the sum, maximum or minimum values are obtained over all ligands.  
The shift from set a to b increases the model applicability but at the cost of omitting subtler ligand effects. For instance, ethylenediamine (11, en) and phenanthroline (10, phen) have the same ligand charge/denticity and direct ligand atom (N), making them equivalent in set b except for the larger size of phen. System size alone is not expected to be a good predictor of field strength (e.g., the small CO is one of the strongest field ligands). In set c, we introduce properties that depend on the empirical pairwise Pauling electronegativity difference (Δχ) between the ligand connecting atom (LC) and any ith atom connected (CA) to it:

Δχ_{LC,i} = χ_{LC} − χ_{i}  (4) 
These wholecomplex differences include the maximum, max(Δχ), and minimum, min(Δχ), as well as sum:

 (5) 
which is taken over the direct ligand atom and all atoms bonded to it for all ligands (lig.) in the complex. These additional set c descriptors reduce Δ
E_{H–L} RMSE slightly and decrease the
RMSE to its lowest value (see
Table 1). In set d, we eliminate min(Δ
χ), expecting it to be redundant with the max and sum, at the cost of a small increase in both RMSEs.
Finally, in sets e–g, we replace ligand size (i.e., number of atoms) with general descriptors to enable prediction on molecules larger than those in any training set. For example, tetraphenylporphyrin will have comparable electronic properties to unfunctionalized porphyrin (12), despite a substantial size increase. In set e, we introduce the maximum bond order of the ligand connecting atom to any of its nearest neighbors, a measure of the rigidity of the ligand environment, which is zero if the ligand is atomic (see ESI Table S1†). In set f, we eliminate the number of atoms and bond order metric, replacing them with a broader measure of the ligand geometry adjacent to the metal. After trial and error, we have selected the truncated Kier shape index,^{86}^{2}κ, which is defined by the inverse ratio of the square of number unique paths of length two (^{2}P) in the molecular graph of heavy atoms to the theoretical maximum and minimum for a linear alkane with the same number of atoms:

 (6) 
and set to zero for any molecules that do not have paths of length two. The truncation means that only the ligand atoms within three bonds of the connecting atom are included in the graph. The set f MSEs are comparable to or a slight increase from sets with molecule size, but they beneficially eliminate system size dependence. In set g, we reintroduce the bond order metric as well, providing the lowest MSEs except for set a or c, both of which are much less transferable than set g. Thus, the comparable performance of set g to a full ligand descriptor (set a) supports our hypothesis that a combination of metalcentric and ligandcentric in a heuristic descriptor set can be predictive and transferable. This final feature space is 15dimensional with five percomplex descriptors and five perligand descriptors for each equatorial or axial ligand (see
Table 2 for ranges of values and descriptions). A comparison of all errors and weights of variables across the seven data sets is provided in ESI Tables S7–S21 and Fig. S1.
†
Table 2 Optimal (set g) input space descriptors and their range in the training set. Δχ is the difference in Pauling electronegativity between the ligand connecting atom and all atoms bonded to it. When training the ANNs, a continuous descriptor corresponds to a single input node, whereas discrete descriptors correspond to one node per level
Symbol 
Type 
Descriptor 
Values or range 
Wholecomplex descriptors

M 
Discrete 
Metal identity 
Cr, Mn, Fe, Co, Ni 
O 
Continuous 
Oxidation state 
2 to 3 
me 
Continuous 
Max. Δχ over all ligands 
−0.89 to 1.20 
se 
Continuous 
Sum of Δχ over all ligands 
−5.30 to 7.20 
a
_{HF}

Continuous 
HF exchange fraction 
0.00 to 0.30 

Ligandspecific descriptors

L 
Discrete 
Ligand connection atom 
Cl, S, C, N, or O 
C 
Continuous 
Ligand charge 
0 to −2 
k 
Continuous 
Truncated Kier index 
0.00 to 6.95 
b 
Continuous 
Ligand bond order 
0 to 3 
D 
Continuous 
Ligand denticity 
1 to 4 
2.3 Training and uncertainty quantification of ML models
ANNs enable complex mapping of inputs to outputs^{92} beyond multiple linear regression and support the use of both discrete (i.e., binary choices such as metal identity) and continuous (e.g., the % of HF exchange) variables. Here, we apply an ANN with an input layer, two intermediate hidden layers, and an output layer (Fig. 2). The network topology was determined by trial and error, with additional hidden layers yielding no improved performance. All analysis is conducted in R3.2.5,^{91} using the h2o^{93} package with tanh nonlinearity and linear output. Network weights and full training and test data are provided in the ESI.†
As with many ML models, ANNs are sensitive to overfitting due to the number of weights to be trained.^{94} We address overfitting using dropout,^{95,96} wherein robustness of the fit is improved by zeroing out nodes in the network with an equal probability, p_{drop}, at each stage of training (5% for spinstate splitting, 15% for HF exchange sensitivity, and 30% for bond lengths, selected by trial and error). Dropout has been shown to address overfitting when training feedforward ANNs on small datasets,^{96} with larger values of p_{drop} giving more aggressive regularization that worsens training errors but improves test errors. We use L2 weight regularization with a fixed penalty weight λ, as is applied in standard ridge regression, with an effective loss function for training:

 (7) 
here,
W_{l} refers to the weights from layer
l to
l + 1,
b_{l} are biases at layer
l,
ỹ(
x_{n}) is the ANN prediction for the input–output pair (
x_{n},
y_{n}), and the sums run over
N training pairs and
L layers.
During network training, we randomize the order of data points and partition the first 60% as training data and the last 40% for testing. Dropout networks, consisting of two hidden layers of 50 nodes each, are trained on the data set for varying values of λ ranging from 10^{−1} to 10^{−6} using 10fold cross validation. For each λ, the training data is partitioned into ten groups, and a network is trained on each combination of nine of the groups and scored based on eqn (7) on the leftout group. The parameter with the lowest average prediction error is used to select the best regularization parameter: 5 × 10^{−4} for spinstate splitting, 10^{−2} for HF exchange sensitivity, and 10^{−3} for bond lengths. We varied and optimized^{97} the learning rate between 0.05 and 1.5, and optimal rates were selected as 1.0 (bond lengths) and 1.5 (spinstate splitting or HF exchange sensitivity). We use batch optimization for training (batch size = 20) for 2000 epochs. The training algorithm minimizes eqn (7) over the training data using stochastic gradient descent.^{97–100}
It has not been possible to estimate ANN model uncertainty^{95,101} with the possible exception of bootstrapping^{102} by training the ANN on numerous subsamples of available training data. Model uncertainty will be due to either highsensitivity to descriptor changes or test molecule distance in chemical space to training data (see also Section 3). Recent work^{94} showed that minimization of the loss function in eqn (7) is equivalent to approximate variational optimization of a Gaussian process (GP), making previously suggested ANN sampling for different dropout realizations^{95} a rigorously justified^{94} model uncertainty estimate.
We sample J distinct networks (in this work, J = 100) with different nodes dropped at the optimized weights and average over the predictions:

 (8) 
The ANN predictive variance is:
^{94} 
 (9) 
here,
τ is

 (10) 
where
N is the number of training data points, and
l is a model hyperparameter for the GP that affects the estimation of predictive variance but does not enter into the ANN training. The contribution of
τ in
eqn (9) is a baseline variance inherent in the data, whereas the second term represents the variability of the GP itself. We obtain
τ values of 0.6 for spinstate splitting, 0.07 for HF exchange sensitivity, and 10
000 for bond lengths (see Section 3). We choose
l by maximizing the log predictive likelihood of the corresponding GP based on the training data (details are provided in the ESI Text S2
†).
We selected an ANN based on the successful demonstrations^{11,14,103} of ANNbased models for predicting quantum chemical properties but also provide a comparison to two other common machine learning models:^{89} kernel ridge regression (KRR) and a support vector regression model (SVR), both using a squareexponential kernel. We used the R package kernlab^{104} and selected hyperparameters (the width of the kernel, and the magnitude of the regularization parameters which are given in the ESI Table S22†) using a grid search and tenfold crossvalidation using the R package CVST.^{105} We also compared training on our descriptor set to a KRR model with a kernel based on the L1 distance between sorted Coulomb matrix representations,^{87} as demonstrated previously.^{52,103}
3. Results and discussion
3.1 Overview of data set spinstate energetics
Analysis of the qualitative and quantitative features of the spinstate splitting data set motivates the training of an ANN to move beyond ligand field arguments. We visualize qualitative ground states (i.e., highspin or lowspin) for the homoleptic subset of the data using a recursive binary tree (Fig. 4, descriptor definitions provided in Table 2), as previously outlined^{106} and implemented in the open source rpart package^{107} for R3.2.5.^{91} A recursive binary tree is a list of “branches” of the data ordered by statistical significance that gives the most homogeneous final “leaves” (here, with at least 10 data points) after a given number of permitted divisions (here, 6). Using descriptor set g, the data are partitioned into branches by testing which descriptors provide the “best” division to produce majority high or lowspin states in leaves based on the concept of information impurity^{107} and pruning to remove statistically insignificant branches. The resulting electronic structure spectrochemical “tree” simultaneously addresses metalspecific strengths of ligands and exchange–correlation sensitivity. As expected, strong field direct carbon ligands (no Cl, N, O or S in Fig. 4) provide the root division of the tree, producing lowspin ground states for 92% of all Mn, Fe, and Co complexes (far right box on the third tier in Fig. 4). Next level divisions include the M(II) oxidation state for a_{HF} > 0.05 that are predominantly (96%) highspin. Spinstate ordering is wellknown^{45,59} to be sensitive to HF exchange, and the tree reveals Mn^{3+} with nitrogen ligands to have the strongest a_{HF} dependence, since they are 69% highspin for a_{HF} > 0.1 but 90% lowspin for a_{HF} ≤ 0.1. Extension of the recursive binary tree to heteroleptic compounds produces a secondlevel division based on sum(Δχ), validating the relevance of the identified electronegativity descriptors for predicting heteroleptic spinstate ordering (ESI Fig. S2†).

 Fig. 4 Binary ground state classification tree for homoleptic compounds. M indicates metal identity, L ligand connection atom, O oxidation state, a the fraction of HF exchange, C the charge, and D the ligand denticity. Each leaf node indicates the percent of elements in that leaf (light blue boxes for highspin and dark gray boxes for lowspin) in bold font and percentage of total homoleptic population in the node (italic font, in parentheses).  
Quantitatively, the maximum ΔE_{H–L} in the data set is 90.7 kcal mol^{−1} for the strongfield Co(III)(misc)_{6} complex at a_{HF} = 0.00, and the minimum value is −54.2 kcal mol^{−1} for the weakfield Mn(II)(NCS^{−})_{6} at a_{HF} = 0.30. These extrema are consistent with (i) the ordering of metals in the spectrochemical series^{43} and (ii) the uniform effect of stabilizing highspin states with increasing HF exchange. By comparing compound trends in the data set, we are able to identify whether additivity in ligand field effects, which has been leveraged previously in heuristic DFT correction models,^{108–110} is a universally good assumption. For the Fe(III)(Cl^{−})_{6−n}(pisc)_{n} complexes (denoted 11 through 33 in Fig. 5), increasing n from 0 to 2 through the addition of two axial pisc ligands increases the spinstate splitting by 15.1 kcal mol^{−1} per replaced chloride. Transitioning to a complex with all equatorial pisc ligands (n = 4) increases the spinstate splitting by only 10.4 kcal mol^{−1} per additional ligand, and the homoleptic structure pisc (n = 6) only adds 7.5 kcal mol^{−1} per additional ligand beyond the n = 4 case. An additive model cannot precisely reproduce diminishing ligand effects. As a stronger example for the need for nonlinear models such as an ANN, replacing two axial ligands from the strongfield Mn(II)(CO)_{6} complex with the weakerfield NCS^{−} (66 and 67 in ESI Fig. S3†) alters ΔE_{H–L} by <1 kcal mol^{−1}, as strongfield ligands (e.g., CO, CN^{−}) have an overriding effect on spinstate splitting.

 Fig. 5 ANN model predicted (ANN, blue bars) and computed (data, gray bars) spinstate splittings, ΔE_{H–L}, for the B3LYP functional (a_{HF} = 0.20) in kcal mol^{−1}. Complexes are labeled by equatorial and then axial ligands according to the numbering indicated in Fig. 1 and colorcoded by direct ligand atom (green for chlorine, gray for carbon, blue for nitrogen, red for oxygen, and orange for sulfur). The error bars represent an estimated ±1 standard deviation credible interval from the mean prediction, and error bars that do not encompass the computed value are highlighted in red. Brown dashed lines indicate a ±5 kcal mol^{−1} range around zero ΔE_{H–L}, corresponding to neardegenerate spin states.  
3.2 Spinstate splittings from an ANN
Motivated by nonlinear effects in ligand additivity, we trained an ANN using a heuristic descriptor set (see Section 2.2) to predict qualitative spinstate ordering and quantitative spinstate splitting. The ANN predicts the correct ground state in 98% of the test cases (528 of 538) and 96% of training cases (777 of 807). All of the misclassifications are for cases in which DFT ΔE_{H–L} is <±5 kcal mol^{−1} (ESI Table S23†). The ANN spinstate prediction errors are not sensitive to HF exchange mixing, and thus our trained ANN is able to predict ground states of transition metal complexes from the pure GGA limit to hybrids with moderate exchange.
We assess quantitative performance with root mean squared errors (RMSE) of the ANN (eqn (3)), overall and by metal (Fig. 6, ESI Table S24 and Fig. S3–S6†). The comparable RMSE of 3.0 and 3.1 kcal mol^{−1} for the training and test data, respectively, indicate an appropriate degree of regularization. The ANN predicts DFT spinstate splittings within 1 kcal mol^{−1} (i.e., “chemical accuracy”) for 31% (168 of 538) of the test data and within 3 kcal mol^{−1} (i.e., “transition metal chemical accuracy^{111}”) for 72% (389 of 538) of the test data. Only a small subset of 49 (4) test molecules have errors above 5 (10) kcal mol^{−1}, and correspond to strongfield Co and Cr complexes, e.g., Cr(II)(NCS^{−})_{2}(pisc)_{4} (ESI Fig. S5†). The model is equivalently predictive for homoleptic and heteroleptic compounds at 2.2 and 2.3 kcal mol^{−1} average unsigned error respectively.

 Fig. 6 Error boxplots for ΔE_{H–L} in kcal mol^{−1} using the ANN for test (top) and training (bottom) data partitioned by metal identity. The top number inside the box indicates the number of cases in each set, and the bottom number indicates the RMSE in kcal mol^{−1}. The range for both graphs is from 15 kcal mol^{−1} to −15 kcal mol^{−1}.  
The training and test RMSEs broken down by metal reveal comparable performance across the periodic table (Fig. 6). Slightly higher test RMSEs (maximum unsigned errors) for Co and Fe complexes at 3.8 (15.7) and 3.3 (13.0) kcal mol^{−1}, respectively, are due to the train/test partition and more variable ligand dependence of spinstate ordering in these complexes (Fig. 6 and ESI Table S24†). When the ANN performs poorly, the errors are due to both under and overestimation of ΔE_{H–L} for both strong and weakfield ligands, regardless of HF exchange fraction: e.g., ΔE_{H–L} for Co(III)(CN^{−})_{6} at a_{HF} = 0.00 and Co(III)(en)_{3} at a_{HF} = 0.20 are overestimated by 14 and 9 kcal mol^{−1}, respectively, but ΔE_{H–L} for Fe(III)(Cl^{−})_{6} at a_{HF} = 0.10 and Co(II)(H_{2}O)_{2}(CN^{−})_{4} at a_{HF} = 0.30 and are underestimated by 9 and 7 kcal mol^{−1}, respectively.
Quantified uncertainty estimates correspond to a baseline standard deviation in the model of approximately 1.5 kcal mol^{−1} and a mean total estimated standard deviation across the training and test cases of 3.8 and 3.9 kcal mol^{−1}, respectively (see Section 2.3 and error bars on Fig. 5). These credible intervals are not rigorously confidence intervals but can highlight when prediction uncertainty is high: a ±1 (±2) standard deviation (std. dev.) interval on ANN predictions captures 83% (98%) of computed values for test set (see ESI Fig. S7†). Highest std. dev. values of around 5 kcal mol^{−1} are observed for Fe(II) and Mn(II) complexes and the lowest are around 3 for Cr and Co complexes (see ESI†). A single std. dev. around the ANN prediction contains the calculated ΔE_{H–L} for 26 of 29 Fe(III) complexes at a_{HF} = 0.20 but misses heteroleptic oxygen coordinating complexes, 1313 and 141, and underestimates the effect of C/N ligands in 37 (Fig. 5). The model performs consistently across different ligand sizes, from porphyrin Fe(III) complexes (1213, 125) to Fe(II)(NH_{3})_{6} and Fe(II)(CO)_{6} (66 and 88). For ligandspecific effects, the ANN performs well, reversing splitting magnitude as equatorial and axial ligands are swapped (e.g., 13versus31).
Review of other metals/oxidation states reveals comparable performance for cases where the highspin state is always favored (e.g., Mn(II), Cr(III), or Ni(II)), lowspin state is always favored (e.g., Cr(III)), and those where ligands have strong influence over the favored spin state (e.g., Fe(II) and Cr(II)) (see ESI Fig. S3–S6†). For instance, metalspecific effects examined through comparison of M(II)(CO)_{6} complexes (Fig. 7) reveal good ANN performance both for where the strongfield ligand strongly favors the lowspin state (i.e., Fe and Ni) and where the spinstates are nearly degenerate (i.e., Cr, Mn, Co). The trends outlined here for 20% HF exchange hold at other exchange mixing values (ESI Table S23†). Thus, our ANN trained on a modest data set with heuristic descriptors predicts spinstate splitting within a few kcal mol^{−1} of the DFT result.

 Fig. 7 ANN model predicted (ANN, blue bars) and computed (data, gray bars) spinstate splittings, ΔE_{H–L}, with the B3LYP functional (a_{HF} = 0.20) in kcal mol^{−1} on M(II)(CO)_{6} complexes, where M = Cr, Mn, Fe, Co, or Ni. The error bars represent an estimated ±1 standard deviation credible interval from the mean prediction, and brown dashed lines indicate a ±5 kcal mol^{−1} range around zero ΔE_{H–L}, corresponding to neardegenerate spin states.  
Comparing our results to KRR, SVR, and LASSO regression reinforces the choice of an ANN (Table 3 and ESI Fig. S8†). The ANN outperforms KRR with either our descriptor set or the sorted Coulomb matrix descriptor both on the full data set or at fixed HF exchange (ESI Text S3†). The ANN also performs slightly better than SVR on test data with our descriptors. Linear LASSO regression was employed for feature selection (Section 2.2) but is outperformed by all other methods (Table 3). We will revisit the performance of these models on a more diverse molecule test set in Section 3.5 to assess the question of transferability.
Table 3 Train/test data and CSD test set RMSEs and max UEs for ΔE_{H–L} in kcal mol^{−1} for different machine learning methods and descriptor sets compared: KRR, kernel ridge regression, using squareexponential kernel for descriptor set g and the L1 matrix distance^{52} for the sorted Coulomb matrix descriptor; SVR, support vector regression using squareexponential kernel; ANN, artificial neural network. Results are also given for the KRR/Coulomb case, restricted to B3LYP only since the Coulomb matrix does not naturally account for varying HF exchange
Model 
Descriptor 
Training 
Test 
CSD 
RMSE 
Max UE 
RMSE 
Max UE 
RMSE 
Max UE 
LASSO 
Set g 
16.1 
89.7 
15.7 
93.5 
19.2 
72.5 
KRR 
Set g 
1.6 
8.5 
3.9 
17.0 
38.3 
88.4 
SVR 
Set g 
2.1 
20.9 
3.6 
20.4 
20.3 
64.8 
ANN 
Set g 
3.0 
12.3 
3.1 
15.6 
13.1 
30.4 
KRR 
Sorted Coulomb 
4.3 
41.5 
30.8 
103.7 
54.5 
123.9 
KRR, B3LYP only 
Sorted Coulomb 
17.2 
58.0 
28.1 
69.5 
46.7 
118.7 
3.3 Predicting exchange sensitivity with an ANN
Spinstate splittings exhibit high sensitivity to exchange^{45,59} with linear behavior that we previously identified^{45} to be strongly dependent on direct ligand identity and field strength when we compared a set of Fe complexes. Over the current data set, computed exchange sensitivities are indeed linear, ranging from −174 kcal per mol per HFX for strongfield Fe(II)(CO)_{6} to −13 kcal per mol per HFX for weakfield Cr(III)(en)_{2}(NH_{3})_{2}. Cr(III) is the least exchangesensitive metal in our test set, whereas Fe(II) and Mn(II) are the most sensitive (ESI Table S25 and Fig. S9†).
We therefore generalize previous observations^{45} in an ANN that predicts HF exchange sensitivity of spinstate ordering, , using the same descriptors as for direct spinstate splitting, excluding only a_{HF}. The smaller size of this data set (1/7 the size of the ΔE_{H–L} data set) leads to overfitting, with lower RMSE values of 13 kcal per mol per HFX for the training data versus 22 kcal per mol per HFX for the test set (Table 4, ESI Fig. S10 and Table S26†). Although results are reported in units of HFX (from 0 to 100% exchange), for typical 20% variation in exchange, a 20 kcal per mol per HFX sensitivity error only corresponds to a 4 kcal mol^{−1} energy difference. Both maximum unsigned errors (UE) and RMSEs are largest for Mn(II/III) and Cr(II) complexes, with the largest case producing an 92 kcal per mol per HFX underprediction for Mn(III)(H_{2}O)_{4}(pisc)_{2}. Overall, the ANN prediction errors are less than less than 20 (40) kcal per mol per HFX for 65% (95%) of the test data. The ANN provides a valuable strategy for predicting exchange sensitivity, reproducing nonmonotonic and nonconvex ligand sensitivity in heteroleptic compounds: a Fe(III) complex with ox, 16, and NCS^{−}, 7, ligands is more sensitive to HFX than the respective homoleptic complexes (Fig. 8, other metals in ESI Fig. S11–S14†).
Table 4 Test set RMSEs for in kcal per mol per HFX separated by metal and oxidation state along with minimum and maximum unsigned test errors (UE). The number of test cases is indicated in parentheses
Species 
RMSE 
Min. UE 
Max. UE 
Cr(II) 
21 (14) 
4 
45 
Cr(III) 
17 (8) 
2 
37 
Mn(II) 
24 (6) 
3 
40 
Mn(III) 
38 (8) 
4 
92 
Fe(II) 
18 (9) 
2 
41 
Fe(III) 
15 (12) 
<1 
32 
Co(II) 
17 (8) 
<1 
26 
Co(III) 
20 (8) 
<1 
46 
Ni(II) 
9 (4) 
1 
15 

 Fig. 8 ANN model predicted (ANN, blue bars) and computed (data, gray bars) spinstate splitting sensitivities to HF exchange, , in kcal per mol per HFX, for Fe^{3+} complexes. Complexes are labeled as equatorial and then axial ligands according to the numbering indicated in Fig. 1 and colorcoded by direct ligand atom (green for chlorine, gray for carbon, blue for nitrogen, red for oxygen, and orange for sulfur). The error bars represent an estimated ±1 standard deviation credible interval from the mean prediction, and error bars that do not encompass the computed value are highlighted in red.  
Uncertainty intervals of ANN predictions for HFX sensitivity yield a narrow range from 14 kcal per mol per HFX to 17 kcal per mol per HFX. For the 29 Fe(III) complexes studied, 23 (80%) of the ANN credible intervals span the computed exchange sensitivity (Fig. 8). Across the full metal and oxidation state data set, 70% (83%) of the computed data is contained by ±1 (±2) std. dev. intervals (Fig. 8 and ESI S15†). This performance can be further improved by extending the training data. Exchangesensitivity provides value both for extrapolation of computed (see Section 3.6) or literature values obtained at an arbitrary exchange mixing and in identification of cases of highsensitivity to DFT functional choice.
3.4 Predicting equilibrium geometries with an ANN
Using our descriptor set, we trained an ANN on the minimum metal–ligand bond distances for both lowspin and highspin geometries (min(R_{LS/HS})), which only differ from the exact metal–ligand bond length for distorted or heteroleptic compounds. This ANN for bond length prediction extends capabilities we have recently introduced for generating highquality transition metal complex geometries^{67} in order to enable spinstate dependent predictions without requiring extended geometryoptimization. Furthermore, comparison of adiabatic and vertical spinstate splittings computed either at the low or highspin optimized geometries reveals that the vertical splitting at the HS geometry is indistinguishable from the adiabatic splitting, but the LS geometry vertical splitting favors the LS state by 10–30 kcal mol^{−1}, increasing with a_{HF} (Fig. 9). Thus, if the ANN bond length predictions are accurate, adiabatic spinstate splittings can be obtained from DFT single points at ANNpredicted HSonly or both LS/HS geometries.

 Fig. 9 The vertical or adiabatic spinstate splittings, ΔE_{H–L}, in kcal mol^{−1} as a function of HF exchange, a_{HF}, for Fe(II)(CO)_{6}. Spinstate splittings evaluated at the HS or LS geometries are indicated by open blue squares and open red circles, respectively. The adiabatic spinstate splitting is shown as filled gray triangles. The HS vertical and adiabatic splittings overlap, whereas the LS vertical splitting overestimates ΔE_{H–L}, as indicated by the green arrow and annotated δ in kcal mol^{−1} for a_{HF} = 0.00 and a_{HF} = 0.30.  
Metal–ligand bond distances in the a_{HF} = 0.20 data set vary from min(R_{LS}) = 1.81 Å (in Fe(II)(pisc)_{2}(Cl^{−})_{4}) to min(R_{HS}) = 2.55 Å (in Fe(III)(Cl^{−})_{6}). The metal–ligand bond length ANN produces comparable RMSE across training (0.02 Å for LS and HS) and test (0.02 Å for LS and 0.03 Å for HS) data with comparable errors regardless of metal identity and oxidation or spinstate (ESI Tables S27–S29 and Fig. S16–S27†). ANN bond length std. devs. range from 0.026 to 0.045 Å with a ∼0.01 Å baseline contribution. For lowspin (highspin) complexes, 79% (81%) and 96% (96%) of the calculated values fall within one and two std. dev. of ANNpredicted bond lengths, respectively (ESI Fig. S17 and S23†).
The ANN bond lengths fall outside of computed values for lowspin Fe(III) complexes by more than a full standard deviation in seven cases, e.g., underestimating Fe–C distances in CN (75, 135) and pisc (37, 313) complexes (Fig. 10). However, it also reproduces subtle trends, e.g. replacing axial ligands in homoleptic LS Fe(III)(pisc)_{6} (33 in Fig. 10, min(R_{LS}) = 1.92 Å) with Cl^{−} increases the minimum bond distance to 1.94 Å (31 in Fig. 10), but replacing equatorial pisc ligands instead with Cl^{−} (13 in Fig. 10) decreases the minimum bond distance to 1.90 Å, a feature reproduced by the ANN. Nonadditive bond length effects motivate the use of the ANN in initial geometry construction.^{67} Indeed, when we use ANNpredicted metal–ligand bond lengths in structure generation instead of our previous strategy based on a discrete database of DFT bond lengths,^{67} we reduce the metal–ligand component of the gradient by 54–90% (ESI Text S4, Fig. S28 and Table S30†). The ANNpredicted bond lengths and spin states are now available in molSimplify^{67} as an improved tool for structure generation.

 Fig. 10 ANN model predictions (ANN, blue bars) and computed (data, gray bars) minimum LS Fe^{3+} bond lengths, min(R_{LS}), in Å. Complexes are labeled as equatorial and then axial ligands according to the numbering indicated in Fig. 1 and colorcoded by direct ligand atom (green for chlorine, gray for carbon, blue for nitrogen, and red for oxygen). The error bars represent an estimated ±1 standard deviation credible interval around the mean prediction, and error bars that do not encompass the computed value are highlighted in red. Fe(III)(Cl)_{6} (11) is excluded due to being off scale: it has a predicted/calculated bond length of 2.44/2.45 Å, and an error standard deviation of ±0.02.  
3.5 Expanding the test set with experimental transition metal complexes
In order to test the broad applicability of the trained ANNs, we selected 35 homoleptic and heteroleptic octahedral complexes from the Cambridge Structural Database^{69} (CSD) with a range of metals (Cr to Ni) and direct ligand atom types (N, C, O, S, Cl) (ESI Table S30†). The CSD test cases span a broader range of compounds than the training set, containing (i) larger macrocycles, e.g. substituted porphyrins (tests 9, 25), clathrochelates (test 16), phthalocyanines (tests 4, 7), and cyclams (tests 5, 12, 14, 17, 24, 29, and 33, 12 and 33 shown in Fig. 11) and (ii) coordination combinations or functional groups, e.g., OCN in test 30, absent from the training set. Indeed, large CSD test molecule sizes, e.g. up to 103 atoms in a single equatorial ligand, further motivates our relatively sizeindependent descriptor set over forms that do not scale well with molecule size.

 Fig. 11 Representative CSD test set molecules shown in ball and stick representation with carbon atoms in gray, nitrogen atoms in blue, oxygen in red, hydrogen in white, chlorine in green, chromium in orange, and iron in brown. Test molecules 12 (CSD ID: SUMLET) and 33 (CSD ID: YUJCIQ) are Cr(III) cyclams for which the ANN performs least well, and test molecules 8 (CSD ID: TPYFEC04) and 31 (CSD ID: BIPGEN) are cases for which the ANN predicts ΔE_{H–L} within 3 kcal mol^{−1}.  
The ANN predicts CSD test case spinstate splittings within 5 kcal mol^{−1} for 15 of the 35 complexes, an overall mean unsigned error of 10 kcal mol^{−1}, and RMSE of 13 kcal mol^{−1} (see ESI Table S31†). The large RMSE is due in part to poor performance on earlytransitionmetal cyclams (red symbols in left panel of Fig. 12) for which the ANN overestimates spinstate splitting by at about 30 kcal mol^{−1} (Crcyclams, tests 12 and 33 in Fig. 11). The ANN predicts spinstate splittings within around 3 kcal mol^{−1} for several nonmacrocyclic complexes that are better represented in the training data (e.g., test cases 8 and 31 in Fig. 11). The correct ground state is assigned in 90% of CSD test cases (96% after excluding cyclams); the only incorrect, noncyclam spin state assignment is a spincrossover complex, test 25 (calculated ΔE_{H–L} = −0.2 kcal mol^{−1}). Compared to other machine learning models (KRR and SVR), the ANN is more transferable to dissimilar CSD structures (Table 3), outperforming the nextbest model, SVR, by 30%. The relative success of the ANN on the CSD data is partially attributable to the use of dropout regularization, which has been shown^{96} to improve robustness.

 Fig. 12 ANN spinstate splitting energy, ΔE_{H–L}, predicted values on CSD test structures vs. DFTcalculated values, both at a_{HF} = 0.20 and in kcal mol^{−1}. Direct prediction (left) is compared to GGA calculations and extrapolation using the predicted slope from the ANN (right). Error bars represent a credible interval of one standard deviation from the model uncertainty analysis (either in direct ANN at left or slope ANN at right), and a parity line (black, dashed) is indicated. Cyclams are indicated by red triangles, as described in main text, and the remaining test cases are indicated by blue squares.  
The observation of good performance with reasonable similarity between CSD structures and the training data but poor performance when the CSD structure is not wellrepresented motivates a quantitative estimate of test compound similarity to training data. We first computed overall molecular similarity metrics (e.g., FP2 fingerprint via Tanimoto,^{38,112} as implemented in OpenBabel^{113,114}) but found limited correlation (R^{2} = 0.1) to prediction error (see ESI Fig. S29 and Text S5†). Comparing the Euclidean and uncentered Pearson distances in descriptor space between the CSD test cases and the closest training data descriptors provides improved correlation to prediction error of R^{2} = 0.3 and R^{2} = 0.2, respectively (ESI Fig. S30†). Large errors (i.e., >15 kcal mol^{−1}) are only observed at a Euclidean norm difference exceeding 1.0 (half of the CSD data), providing an indication of lack of reliability in ANN prediction. This high distance to training data does not guarantee inaccurate prediction, e.g., CSD test case 8, a Fe(II) tetrapyridine complex, is predicted with fortuitously good ∼2 kcal mol^{−1} error but has a Euclidean norm difference >1.4. We have implemented the Euclidean norm metric alongside the ANN in our automated screening code^{67} to detect complexes that are poorly represented in training data and advise retraining or direct calculation.
ANNpredicted equilibrium metal–ligand bond lengths for both HS and LS CSD geometries produced RMSEs of 0.10 and 0.07 Å, respectively (ESI Tables S32 and S33†). Trends in bond length prediction error differ from those obtained for spinstate splitting. For instance, bond length errors are average in the cyclams even though spinstate splitting predictions were poor. The large Euclidean distance to training data heuristic (>1.0) is observed for five of the seven large (i.e., >0.1 Å) HS bond distance errors (see ESI Texts S4 and S5, Fig. S31 and S32†). The highest HS prediction errors (>0.2 Å) occur for tests 8 and 35, underestimating the Fe–N bond length by 0.2 Å (2.1 Å ANN vs. 2.3 Å calculated) in the former case. Despite poor geometric predictions, the ANN predicts test 8 ΔE_{H–L} to within 3 kcal mol^{−1}, and this differing performance is due to the fact that predictions of these two outputs are independent. Interligand effects that are ignored by our descriptor set can restrict bond length extension, e.g. in test 16, where an O–H⋯O^{−} interligand hydrogenbond produces an unusually short 1.9 Å highspin Fe–N bond distance (vs. ANN prediction of 2.1 Å). Future work will focus on incorporating extended metrics of rigidity to account for these effects.
We investigated the relationship between the experimental CSD bond distances and the ANNpredicted bond distances. If the experimentally measured bond distance lies close to one spin state's predicted bond length, then the complex may be expected to be in that spin state, assuming (i) the ANN provides a good prediction of the spinstate specific bond lengths and (ii) that the gasphase optimized DFT and CSD bond distances are comparable. The majority of experimental bond lengths are near the extrema of the ANN predictions (subset with a calculated DFT LS–HS bond distance of at least 0.05 Å shown in Fig. 13 to improve readability, full set in ESI Fig. S33†). Nine of the twelve (9 of 9 in Fig. 13) experimental bond lengths that are on or above the predicted HS bond distance boundary have an HS ground state, eleven of the fifteen (6 of 6 in Fig. 13) experimental bond lengths that are on or below the predicted LS bond distance have an LS ground state, and remaining structures (3 in Fig. 13) reside at intermediate distances. Some discrepancies are due to differences between the gas phase geometries and those in the crystal environment (e.g., test 27 in Fig. 13 and see ESI Tables S31–S33†). This bondlengthbased spinassignment thus provides a strategy for corroboration of direct spinstate prediction.

 Fig. 13 Comparison of measured CSD bond distances in the crystal phase, represented by symbols (red squares for highspin or blue triangles for lowspin based on DFT assignment at a_{HF} = 0.20) with the ANN predicted HS (red line) and LS (blue line) bond distances. Only the CSD test cases where the difference between DFT LS and HS computed bond distances is ≥0.05 Å are shown for clarity. For all of these cases, the ANN correctly predicts the DFT spin state.  
3.6 Extrapolating pure exchange–correlation functionals to hybrids with an ANN
Linear spinstate HF exchange sensitivity may be exploited to predict properties at one a_{HF} value from computed properties obtained at another, e.g., to translate literature values or to accelerate periodic, planewave calculations where incorporation of HF exchange increases computational cost. We carry out comparison of the utility of this ΔMLinspired^{52} strategy on the 35 CSD test set to identify if prediction errors are improved, especially for molecules poorlyrepresented in the training set.
On the CSD molecules, extrapolating a_{HF} = 0.00 spinstate ordering to a_{HF} = 0.20 with the exchangesensitivity ANN reduces the maximum error to 23 kcal mol^{−1} and decreases the mean unsigned error and RMSE to 5 kcal mol^{−1} and 7 kcal mol^{−1} (the right pane of Fig. 11 and ESI Table S34†). For the GGA + slope ANN approach, excluding the nine cyclams does not change the RMSE/MUE values, confirming good ANN exchangesensitivity prediction even when spinstate splitting prediction is poor.
These reduced average errors are quite close to the uncertainty introduced by the slope prediction performance at around 4 kcal mol^{−1} over a 20% exchange interval. Although this approach does eliminate the largest outliers and improve prediction across the CSD test set, it necessitates semilocal DFT geometry optimizations or a judicious bond length choice for verticallyapproximated spinstate ordering. This approach also has limited benefit for cases wellrepresented in the training data set due to the sparser data set in the exchange sensitivity ANN. Indeed, over the original test set molecules, extrapolated ANN exchange sensitivities on top of calculated a_{HF} = 0.00 splittings produce an RMSE of around 4 kcal mol^{−1} comparable to or slightly worse than direct prediction (ESI Fig. S34†).
4. Conclusions
We have presented a series of ANN models trained using 2690 DFT geometry optimizations of octahedral transition metal complexes generated from a set of 16 candidate axial and equatorial ligands and transition metals (Cr–Ni) at varying fractions of HF exchange. From the unseen test cases of a 60–40% traintest partition, we demonstrated good accuracy on spinstate splitting predictions of around 3 kcal mol^{−1} and metal–ligand bond distances around 0.02–0.03 Å. Our simple descriptor set, including: (i) the ligand connection atom, (ii) electronegativity and bonding of the coordinating ligand atom environment, (iii) ligand formal charge, (iv) ligand denticity, and (v) metal identity and oxidation state ensures transferability of the ANN. Importantly, the employed connectivity models are not 3Dstructurebased, instead relying on a truncated graphtheoretic representation of the ligand, making the approach suitable for screening large numbers of complexes without precise structural information. Although we have trained ANNs to predict bond lengths and spinstate splitting, the data set and descriptors could be used to predict other quantities such as ionization potential, redox potential, or molecular orbital energies. Such efforts are currently underway in our lab.
A test of our ANN on diverse molecules obtained from an experimental database indicated good performance, with MUEs of 5 kcal mol^{−1} for spin states for compounds within our proposed Euclidean distance reliability criteria and 10 kcal mol^{−1} for the full set. In both diverse and representative cases, the ANN outperforms other machine learning models. Our ANN predictions of HF exchange sensitivity provide a tool for interpolating between exchange–correlation functionals or extrapolating from semilocal GGAs to a hybrid result, which we demonstrated on CSD cases, improving MUE to 5 kcal mol^{−1} across the full 35 molecule set.
Natural extensions to this work include the development of the current ANN for extrapolation of GGA to hybrid functional properties in condensed matter systems and generalizing the coordination definition to enable prediction of properties of unsaturated metals in catalytic cycles. Overall, we have demonstrated a relatively sparse feature space to be capable of predicting electronic structure properties of transition metal complexes, and we anticipate that this strategy may be used for both highthroughput screening with knowledge of functional choice sensitivity and in guiding assessment of sources of errors in approximate DFT.
Acknowledgements
The authors acknowledge partial support by the National Science Foundation under grant number ECCS1449291. H. J. K. holds a Career Award at the Scientific Interface from the Burroughs Wellcome Fund. This work was carried out in part using computational resources from the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number ACI1053575. The authors thank Adam H. Steeves for providing a critical reading of the manuscript and Prof. Youssef Marzouk for helpful conversations.
References
 R. GomezBombarelli, J. AguileraIparraguirre, T. D. Hirzel, D. Duvenaud, D. Maclaurin, M. A. BloodForsythe, H. S. Chae, M. Einzinger, D. G. Ha, T. Wu, G. Markopoulos, S. Jeon, H. Kang, H. Miyazaki, M. Numata, S. Kim, W. Huang, S. I. Hong, M. Baldo, R. P. Adams and A. AspuruGuzik, Design of Efficient Molecular Organic LightEmitting Diodes by a HighThroughput Virtual Screening and Experimental Approach, Nat. Mater., 2016, 15, 1120–1127 CrossRef CAS PubMed.
 E. O. PyzerKnapp, K. Li and A. AspuruGuzik, Learning from the Harvard Clean Energy Project: The Use of Neural Networks to Accelerate Materials Discovery, Adv. Funct. Mater., 2015, 25, 6495–6502 CrossRef CAS.
 J. K. Norskov and T. Bligaard, The Catalyst Genome, Angew. Chem., Int. Ed. Engl., 2013, 52, 776–777 CrossRef CAS PubMed.
 A. Jain, S. P. Ong, G. Hautier, W. Chen, W. D. Richards, S. Dacek, S. Cholia, D. Gunter, D. Skinner and G. Ceder, Commentary: The Materials Project: A Materials Genome Approach to Accelerating Materials Innovation, APL Mater., 2013, 1, 011002 CrossRef.
 A. M. Virshup, J. ContrerasGarcía, P. Wipf, W. Yang and D. N. Beratan, Stochastic Voyages into Uncharted Chemical Space Produce a Representative Library of All Possible DrugLike Compounds, J. Am. Chem. Soc., 2013, 135, 7296–7303 CrossRef CAS PubMed.
 P. Kirkpatrick and C. Ellis, Chemical Space, Nature, 2004, 432, 823–823 CrossRef CAS.
 B. Meredig, A. Agrawal, S. Kirklin, J. E. Saal, J. W. Doak, A. Thompson, K. Zhang, A. Choudhary and C. Wolverton, Combinatorial Screening for New Materials in Unconstrained Composition Space with Machine Learning, Phys. Rev. B: Condens. Matter Mater. Phys., 2014, 89, 094104 CrossRef.
 L. Li, J. C. Snyder, I. M. Pelaschier, J. Huang, U.N. Niranjan, P. Duncan, M. Rupp, K.R. Müller and K. Burke, Understanding MachineLearned Density Functionals, Int. J. Quantum Chem., 2016, 116, 819–833 CrossRef CAS.
 M. Rupp, Machine Learning for Quantum Mechanics in a Nutshell, Int. J. Quantum Chem., 2015, 115, 1058–1073 CrossRef CAS.
 J. Behler, Perspective: Machine Learning Potentials for Atomistic Simulations, J. Chem. Phys., 2016, 145, 170901 CrossRef PubMed.
 J. Behler, Representing Potential Energy Surfaces by HighDimensional Neural Network Potentials, J. Phys.: Condens. Matter, 2014, 26, 183001 CrossRef CAS PubMed.
 S. Lorenz, A. Groß and M. Scheffler, Representing HighDimensional PotentialEnergy Surfaces for Reactions at Surfaces by Neural Networks, Chem. Phys. Lett., 2004, 395, 210–215 CrossRef CAS.
 N. Artrith, T. Morawietz and J. Behler, HighDimensional NeuralNetwork Potentials for Multicomponent Systems: Applications to Zinc Oxide, Phys. Rev. B: Condens. Matter Mater. Phys., 2011, 83, 153101 CrossRef.
 J. Behler and M. Parrinello, Generalized NeuralNetwork Representation of HighDimensional PotentialEnergy Surfaces, Phys. Rev. Lett., 2007, 98, 146401 CrossRef PubMed.
 F. V. Prudente and J. J. S. Neto, The Fitting of Potential Energy Surfaces Using Neural Networks. Application to the Study of the Photodissociation Processes, Chem. Phys. Lett., 1998, 287, 585–589 CrossRef CAS.
 L. Mones, N. Bernstein and G. Csanyi, Exploration, Sampling, and Reconstruction of Free Energy Surfaces with Gaussian Process Regression, J. Chem. Theory Comput., 2016, 12, 5100–5110 CrossRef CAS PubMed.
 J. S. Smith, O. Isayev and A. E. Roitberg, Ani1: An Extensible Neural Network Potential with DFT Accuracy at Force Field Computational Cost, Chem. Sci., 2017, 3192–3203 RSC.
 J. C. Snyder, M. Rupp, K. Hansen, K.R. Muller and K. Burke, Finding Density Functionals with Machine Learning, Phys. Rev. Lett., 2012, 108, 253002 CrossRef PubMed.

K. Mills, M. Spanner and I. Tamblyn, Deep Learning and the Schrödinger Equation, arXiv preprint arXiv:1702.01361, 2017.
 K. Yao and J. Parkhill, Kinetic Energy of Hydrocarbons as a Function of Electron Density and Convolutional Neural Networks, J. Chem. Theory Comput., 2016, 12, 1139–1147 CrossRef CAS PubMed.
 J. C. Snyder, M. Rupp, K. Hansen, L. Blooston, K.R. Müller and K. Burke, OrbitalFree Bond Breaking via Machine Learning, J. Chem. Phys., 2013, 139, 224104 CrossRef PubMed.
 K. Yao, J. E. Herr and J. Parkhill, The ManyBody Expansion Combined with Neural Networks, J. Chem. Phys., 2017, 146, 014106 CrossRef PubMed.
 F. Hase, S. Valleau, E. PyzerKnapp and A. AspuruGuzik, Machine Learning Exciton Dynamics, Chem. Sci., 2016, 7, 5139–5147 RSC.
 Z. Li, J. R. Kermode and A. De Vita, Molecular Dynamics with ontheFly Machine Learning of QuantumMechanical Forces, Phys. Rev. Lett., 2015, 114, 096405 CrossRef PubMed.
 V. Botu and R. Ramprasad, Adaptive Machine Learning Framework to Accelerate Ab Initio Molecular Dynamics, Int. J. Quantum Chem., 2015, 115, 1074–1083 CrossRef CAS.
 G. Pilania, J. Gubernatis and T. Lookman, MultiFidelity Machine Learning Models for Accurate Bandgap Predictions of Solids, Comput. Mater. Sci., 2017, 129, 156–163 CrossRef CAS.
 G. Pilania, A. MannodiKanakkithodi, B. P. Uberuaga, R. Ramprasad, J. E. Gubernatis and T. Lookman, Machine Learning Bandgaps of Double Perovskites, Sci. Rep., 2016, 6, 19375 CrossRef CAS PubMed.
 X. Ma, Z. Li, L. E. K. Achenie and H. Xin, MachineLearningAugmented Chemisorption Model for CO_{2} Electroreduction Catalyst Screening, J. Phys. Chem. Lett., 2015, 6, 3528–3533 CrossRef CAS PubMed.
 A. MannodiKanakkithodi, G. Pilania, T. D. Huan, T. Lookman and R. Ramprasad, Machine Learning Strategy for Accelerated Design of Polymer Dielectrics, Sci. Rep., 2016, 6, 20952 CrossRef PubMed.
 T. D. Huan, A. MannodiKanakkithodi and R. Ramprasad, Accelerated Materials Property Predictions and Design Using MotifBased Fingerprints, Phys. Rev. B: Condens. Matter Mater. Phys., 2015, 92, 014106 CrossRef.
 G. Pilania, C. Wang, X. Jiang, S. Rajasekaran and R. Ramprasad, Accelerating Materials Property Predictions Using Machine Learning, Sci. Rep., 2013, 3, 2810 CrossRef PubMed.
 J. Lee, A. Seko, K. Shitara and I. Tanaka, Prediction Model of BandGap for AX Binary Compounds by Combination of Density Functional Theory Calculations and Machine Learning Techniques, Phys. Rev. B, 2016, 93, 115104 CrossRef.
 T. Morawietz and J. Behler, A DensityFunctional TheoryBased Neural Network Potential for Water Clusters Including van der Waals Corrections, J. Phys. Chem. A, 2013, 117, 7356–7366 CrossRef CAS PubMed.
 T. Morawietz, A. Singraber, C. Dellago and J. Behler, How van der Waals Interactions Determine the Unique Properties of Water, Proc. Natl. Acad. Sci. U. S. A., 2016, 201602375 Search PubMed.
 M. Rupp, A. Tkatchenko, K.R. Müller and O. A. von Lilienfeld, Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning, Phys. Rev. Lett., 2012, 108, 058301 CrossRef PubMed.
 B. Huang and O. A. von Lilienfeld, Communication: Understanding Molecular Representations in Machine Learning: The Role of Uniqueness and Target Similarity, J. Chem. Phys., 2016, 145, 161102 CrossRef PubMed.
 S. De, A. P. Bartk, G. Csányi and M. Ceriotti, Comparing Molecules and Solids across Structural and Alchemical Space, Phys. Chem. Chem. Phys., 2016, 18, 1–18 RSC.
 G. Maggiora, M. Vogt, D. Stumpfe and J. Bajorath, Molecular Similarity in Medicinal Chemistry: Miniperspective, J. Med. Chem., 2013, 57, 3186–3204 CrossRef PubMed.
 J. Wang, R. M. Wolf, J. W. Caldwell, P. A. Kollman and D. A. Case, Development and Testing of a General Amber Force Field, J. Comput. Chem., 2004, 25, 1157–1174 CrossRef CAS PubMed.
 H. Kubinyi, QSAR and 3D QSAR in Drug Design. Part 1: Methodology, Drug Discovery Today, 1997, 2, 457–467 CrossRef CAS.
 S. W. Benson, F. Cruickshank, D. Golden, G. R. Haugen, H. O'neal, A. Rodgers, R. Shaw and R. Walsh, Additivity Rules for the Estimation of Thermochemical Properties, Chem. Rev., 1969, 69, 279–324 CrossRef CAS.
 R. J. Deeth, The Ligand Field Molecular Mechanics Model and the Stereoelectronic Effects of d and s Electrons, Coord. Chem. Rev., 2001, 212, 11–34 CrossRef CAS.

D. F. Shriver and P. W. Atkins, Inorganic Chemistry, W. H. Freeman and Co., 3rd edn, 1999 Search PubMed.
 K. T. Schütt, H. Glawe, F. Brockherde, A. Sanna, K. R. Müller and E. K. U. Gross, How to Represent Crystal Structures for Machine Learning: Towards Fast Prediction of Electronic Properties, Phys. Rev. B: Condens. Matter Mater. Phys., 2014, 89, 205118 CrossRef.
 E. I. Ioannidis and H. J. Kulik, Towards Quantifying the Role of Exact Exchange in Predictions of Transition Metal Complex Properties, J. Chem. Phys., 2015, 143, 034104 CrossRef PubMed.
 D. C. Ashley and E. Jakubikova, Ironing out the Photochemical and SpinCrossover Behavior of Fe(II) Coordination Compounds with Computational Chemistry, Coord. Chem. Rev., 2017, 97–111 CrossRef CAS.
 D. N. Bowman and E. Jakubikova, LowSpin versus HighSpin Ground State in PseudoOctahedral Iron Complexes, Inorg. Chem., 2012, 51, 6011–6019 CrossRef CAS PubMed.
 T. Z. H. Gani and H. J. Kulik, Where Does the Density Localize? Convergent Behavior for Global Hybrids, Range Separation, and DFT+U, J. Chem. Theory Comput., 2016, 12, 5931–5945, DOI:10.1021/acs.jctc.6b00937.
 E. I. Ioannidis and H. J. Kulik, LigandFieldDependent Behavior of MetaGGA Exchange in TransitionMetal Complex SpinState Ordering, J. Phys. Chem. A, 2017, 121(4), 874–884, DOI:10.1021/acs.jpca.6b11930.
 W. Huang, D.H. Xing, J.B. Lu, B. Long, W. E. Schwarz and J. Li, How Much Can Density Functional Approximations (DFA) Fail? The Extreme Case of the FeO_{4} Species, J. Chem. Theory Comput., 2016, 12, 1525–1533 CrossRef CAS PubMed.
 J. J. P. Stewart, Optimization of Parameters for Semiempirical Methods VI: More Modifications to the NDDO Approximations and ReOptimization of Parameters, J. Mol. Model., 2013, 19, 1–32 CrossRef CAS PubMed.
 R. Ramakrishnan, P. O. Dral, M. Rupp and O. A. von Lilienfeld, Big Data Meets Quantum Chemistry Approximations: The DeltaMachine Learning Approach, J. Chem. Theory Comput., 2015, 11, 2087–2096 CrossRef CAS PubMed.
 L. Shen, J. Wu and W. Yang, Multiscale Quantum Mechanics/Molecular Mechanics Simulations with Neural Networks, J. Chem. Theory Comput., 2016, 12, 4934–4946 CrossRef CAS PubMed.
 H. J. Kulik, Perspective: Treating Electron overDelocalization with the DFT+U Method, J. Chem. Phys., 2015, 142, 240901 CrossRef PubMed.
 A. J. Cohen, P. MoriSanchez and W. Yang, Insights into Current Limitations of Density Functional Theory, Science, 2008, 321, 792–794 CrossRef CAS PubMed.
 O. Salomon, M. Reiher and B. A. Hess, Assertion and Validation of the Performance of the B3LYP* Functional for the First Transition Metal Row and the G2 Test Set, J. Chem. Phys., 2002, 117, 4729–4737 CrossRef CAS.
 M. Reiher, Theoretical Study of the Fe(Phen)_{2}(NCS)_{2} SpinCrossover Complex with Reparametrized Density Functionals, Inorg. Chem., 2002, 41, 6928–6935 CrossRef CAS PubMed.
 M. Reiher, O. Salomon and B. A. Hess, Reparameterization of Hybrid Functionals Based on Energy Differences of States of Different Multiplicity, Theor. Chem. Acc., 2001, 107, 48–55 CrossRef CAS.
 A. Droghetti, D. Alfé and S. Sanvito, Assessment of Density Functional Theory for Iron(II) Molecules across the SpinCrossover Transition, J. Chem. Phys., 2012, 137, 124303 CrossRef CAS PubMed.
 J. E. Sutton, W. Guo, M. A. Katsoulakis and D. G. Vlachos, Effects of Correlated Parameters and Uncertainty in ElectronicStructureBased Chemical Kinetic Modelling, Nat. Chem., 2016, 8, 331–337 CrossRef CAS PubMed.
 G. N. Simm and M. Reiher, Systematic Error Estimation for Chemical Reaction Energies, J. Chem. Theory Comput., 2016, 2762–2773 CrossRef CAS PubMed.
 E. Walker, S. C. Ammal, G. A. Terejanu and A. Heyden, Uncertainty Quantification Framework Applied to the Water–Gas Shift Reaction over PtBased Catalysts, J. Phys. Chem. C, 2016, 120, 10328–10339 CAS.
 M. A. Halcrow, Structure: Function Relationships in Molecular SpinCrossover Complexes, Chem. Soc. Rev., 2011, 40, 4119–4142 RSC.

J.F. Létard, P. Guionneau and L. GouxCapes, Towards Spin Crossover Applications, in Spin Crossover in Transition Metal Compounds III, Springer, 2004, pp. 221–249 Search PubMed.
 C. A. Bignozzi, R. Argazzi, R. Boaretto, E. Busatto, S. Carli, F. Ronconi and S. Caramori, The Role of Transition Metal Complexes in DyeSensitized Solar Devices, Coord. Chem. Rev., 2013, 257, 1472–1492 CrossRef CAS.
 J. N. Harvey, R. Poli and K. M. Smith, Understanding the Reactivity of Transition Metal Complexes Involving Multiple Spin States, Coord. Chem. Rev., 2003, 238, 347–361 CrossRef.
 E. I. Ioannidis, T. Z. H. Gani and H. J. Kulik, Molsimplify: A Toolkit for Automating Discovery in Inorganic Chemistry, J. Comput. Chem., 2016, 37, 2106–2117 CrossRef CAS PubMed.
 A. Kramida, Y. Ralchenko and J. Reader, NIST ASD Team NIST Atomic Spectra Database (Version 5.3), http://physics.nist.gov/asd, accessed March 14, 2017.
 C. R. Groom, I. J. Bruno, M. P. Lightfoot and S. C. Ward, The Cambridge Structural Database, Acta Crystallogr., Sect. B: Struct. Sci., Cryst. Eng. Mater., 2016, 72, 171–179 CrossRef CAS PubMed.
 I. S. Ufimtsev and T. J. Martinez, Quantum Chemistry on Graphical Processing Units. 3. Analytical Energy Gradients, Geometry Optimization, and First Principles Molecular Dynamics, J. Chem. Theory Comput., 2009, 5, 2619–2628 CrossRef CAS PubMed.
 Petachem, http://www.petachem.com. accessed March 14, 2017.
 P. J. Stephens, F. J. Devlin, C. F. Chabalowski and M. J. Frisch,
Ab Initio Calculation of Vibrational Absorption and Circular Dichroism Spectra Using Density Functional Force Fields, J. Phys. Chem., 1994, 98, 11623–11627 CrossRef CAS.
 A. D. Becke, DensityFunctional Thermochemistry. III. The Role of Exact Exchange, J. Chem. Phys., 1993, 98, 5648–5652 CrossRef CAS.
 C. Lee, W. Yang and R. G. Parr, Development of the ColleSalvetti CorrelationEnergy Formula into a Functional of the Electron Density, Phys. Rev. B: Condens. Matter Mater. Phys., 1988, 37, 785–789 CrossRef CAS.
 P. J. Hay and W. R. Wadt,
Ab Initio Effective Core Potentials for Molecular Calculations. Potentials for the Transition Metal Atoms Sc to Hg, J. Chem. Phys., 1985, 82, 270–283 CrossRef CAS.
 V. R. Saunders and I. H. Hillier, A “LevelShifting” Method for Converging Closed Shell Hartree–Fock Wave Functions, Int. J. Quantum Chem., 1973, 7, 699–705 CrossRef.
 J. Kästner, J. M. Carr, T. W. Keal, W. Thiel, A. Wander and P. Sherwood, DLFIND: An OpenSource Geometry Optimizer for Atomistic Simulations, J. Phys. Chem. A, 2009, 113, 11856–11865 CrossRef PubMed.
 G. Ganzenmuller, N. Berkaine, A. Fouqueau, M. E. Casida and M. Reiher, Comparison of Density Functionals for Differences between the High (5t2g) and Low (1a1g) Spin States of Iron(II) Compounds. IV. Results for the Ferrous Complexes [Fe(L)(‘NHS_{4}’)], J. Chem. Phys., 2005, 122, 234321 CrossRef PubMed.
 A. CeretoMassague, M. J. Ojeda, C. Valls, M. Mulero, S. GarciaVallve and G. Pujadas, Molecular Fingerprint Similarity Search in Virtual Screening, Methods, 2015, 71, 58–63 CrossRef CAS PubMed.
 R. P. Sheridan, M. D. Miller, D. J. Underwood and S. K. Kearsley, Chemical Similarity Using Geometric Atom Pair Descriptors, J. Chem. Inf. Model., 1996, 36, 128–136 CrossRef CAS.
 K. Hansen, F. Biegler, R. Ramakrishnan and W. Pronobis, Machine Learning Predictions of Molecular Properties: Accurate ManyBody Potentials and Nonlocality in Chemical Space, J. Phys. Chem. Lett., 2015, 6, 2326–2331 CrossRef CAS PubMed.
 M. Gastegger and P. Marquetand, HighDimensional Neural Network Potentials for Organic Reactions and an Improved Training Algorithm, J. Chem. Theory Comput., 2015, 11, 2187–2198 CrossRef CAS PubMed.
 J. A. Hageman, J. A. Westerhuis, H. W. Frhauf and G. Rothenberg, Design and Assembly of Virtual Homogeneous Catalyst Libraries – Towards in silico Catalyst Optimisation, Adv. Synth. Catal., 2006, 348, 361–369 CrossRef CAS.
 M. Randic, On Molecular Branching, J. Am. Chem. Soc., 1975, 97, 57–77 CrossRef.
 H. Wiener, Correlation of Heats of Isomerization, and Differences in Heats of Vaporization of Isomers, among the Paraffin Hydrocarbons, J. Am. Chem. Soc., 1947, 69, 2636–2638 CrossRef CAS.
 L. B. Kier, A Shape Index from Molecular Graphs, Quant. Struct.Act. Relat., 1985, 4, 109–116 CrossRef CAS.

G. Montavon, K. Hansen, S. Fazli and M. Rupp, in Learning Invariant Representations of Molecules for Atomization Energy Prediction, Advances in Neural Information Processing Systems, ed. F. Pereira, C. J. C. Burges, L. Bottou and K. Q. Weinberger, Curran Associates, Inc., 2012, pp. 440–448 Search PubMed.
 M. Gastegger, C. Kauffmann, J. Behler and P. Marquetand, Comparing the Accuracy of HighDimensional Neural Network Potentials and the Systematic Molecular Fragmentation Method: A Benchmark Study for AllTrans Alkanes, J. Chem. Phys., 2016, 144, 194110 CrossRef PubMed.

T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning, Springer, New York, 2009, vol. 18, p. 764 Search PubMed.
 J. Friedman, T. Hastie and R. Tibshirani, Regularization Paths for Generalized Linear Models via Coordinate Descent, J. Stat. Software, 2010, 33, 1–22 Search PubMed.

R Core Development Team, R: A Language and Environment for Statistical Computing. 2016 Search PubMed.
 H. Larochelle, Y. Bengio, J. Louradour and P. Lamblin, Exploring Strategies for Training Deep Neural Networks, J. Mach. Learn. Res., 2009, 10, 1–40 Search PubMed.

S. Aiello, T. Kraljevic and P. Maj, H2O: R Interface for H2O, 2015 Search PubMed.

Y. Gal and Z. Ghahramani, Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning, arXiv preprint arXiv:1506.02142, 2015.
 N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever and R. Salakhutdinov, Dropout: A Simple Way to Prevent Neural Networks from Overfitting, J. Mach. Learn. Res., 2014, 15, 1929–1958 Search PubMed.

G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever and R. R. Salakhutdinov, Improving Neural Networks by Preventing CoAdaptation of Feature Detectors, arXiv preprint arXiv:1207.0580, 2012, pp. 1–18.

Y. Bengio, Practical Recommendations for GradientBased Training of Deep Architectures, in Neural Networks: Tricks of the Trade, ed. G. B. Orr, K. R. Muller and M. Gregoire, Springer, 2012, pp. 437–478 Search PubMed.
 Y. LeCun, L. Bottou, Y. Bengio and P. Haffner, GradientBased Learning Applied to Document Recognition, Proc. IEEE, 1998, 86, 2278–2323 CrossRef.

A. Candel, V. Parmar, E. LeDell and A. Arora, Deep Learning with H2O. H2O, 2015 Search PubMed.

F. Niu, B. Recht, C. Re and S. J. Wright, Hogwild!: A LockFree Approach to Parallelizing Stochastic Gradient Descent, Advances in Neural Information Processing Systems, 2011, p. 21 Search PubMed.
 G. B. Kingston, M. F. Lambert and H. R. Maier, Bayesian Training of Artificial Neural Networks Used for Water Resources Modeling, Water Resour. Res., 2005, 41, 1–11 CrossRef.
 P. Secchi and E. Zio, Quantifying Uncertainties in the Estimation of Safety Parameters by Using Bootstrapped Artificial Neural Networks, Ann. Nucl. Energy, 2008, 35, 2338–2350 CrossRef CAS.
 K. Hansen, G. Montavon, F. Biegler, S. Fazli, M. Rupp, M. Scheffler, O. A. von Lilienfeld, A. Tkatchenko and K.R. Müller, Assessment and Validation of Machine Learning Methods for Predicting Molecular Atomization Energies, J. Chem. Theory Comput., 2013, 9, 3404–3419 CrossRef CAS PubMed.
 A. Zeileis, K. Hornik, A. Smola and A. Karatzoglou, Kernlab – an S4 Package for Kernel Methods in R, J. Stat. Software, 2004, 11, 1–20 Search PubMed.
 T. Krueger, D. Panknin and M. Braun, Fast CrossValidation via Sequential Testing, J. Mach. Learn. Res., 2015, 16, 1103–1155 Search PubMed.

L. Breiman, J. Friedman, R. A. Olshen and C. Stone, Classification and Regression Trees, Chapman and Hall, CRC, 1984, vol. 5, pp. 95–96 Search PubMed.
 T. Therneau, B. Atkinson, B. Ripley, Rpart: Recursive Partitioning and Regression Trees, https://cran.rproject.org/package=rpart, accessed March 14, 2017.
 D. Coskun, S. V. Jerome and R. A. Friesner, Evaluation of the Performance of the B3LYP, PBE0, and M06 DFT Functionals, and DBLOCCorrected Versions, in the Calculation of Redox Potentials and Spin Splittings for Transition Metal Containing Systems, J. Chem. Theory Comput., 2016, 12, 1121–1128 CrossRef CAS PubMed.
 T. F. Hughes, J. N. Harvey and R. A. Friesner, A B3LYPDBLOC Empirical Correction Scheme for Ligand Removal Enthalpies of Transition Metal Complexes: Parameterization against Experimental and CCSD(T)F12 Heats of Formation, Phys. Chem. Chem. Phys., 2012, 14, 7724–7738 RSC.
 T. F. Hughes and R. A. Friesner, Correcting Systematic Errors in DFT SpinSplitting Energetics for Transition Metal Complexes, J. Chem. Theory Comput., 2011, 7, 19–32 CrossRef CAS PubMed.
 W. Jiang, N. J. Deyonker, J. J. Determan and A. K. Wilson, Toward Accurate Theoretical Thermochemistry of First Row Transition Metal Complexes, J. Phys. Chem. A, 2012, 116, 870–885 CrossRef CAS PubMed.
 D. Bajusz, A. Racz and K. Heberger, Why Is Tanimoto Index an Appropriate Choice for FingerprintBased Similarity Calculations?, J. Cheminf., 2015, 7, 20 Search PubMed.
 N. M. O'Boyle, M. Banck, C. A. James, C. Morley, T. Vandermeersch and G. R. Hutchison, Open Babel: An Open Chemical Toolbox, J. Cheminf., 2011, 3, 33 Search PubMed.
 The Open Babel Package Version 2.3.1, http://openbabel.org, accessed March 14, 2017.
Footnote 
† Electronic supplementary information (ESI) available. See DOI: 10.1039/c7sc01247k 

This journal is © The Royal Society of Chemistry 2017 