Predicting electronic structure properties of transition metal complexes with neural networks

Our neural network predicts spin-state ordering of transition metal complexes to near-chemical accuracy with respect to DFT reference.

Text S1: supporting files index In addition to results reported here, additional files are provided as follows:

S3
Text S2: Estimation of τ We determine a representative value of τ by maximizing the log predictive likelihood of the corresponding GP based on the training data, which is a measure of how likely the observed data are under the GP, and is approximated [6] by In the application here, we have only scalar output and we use the training data to optimize equation S1 with respect to τ numerically. We use J = 100 repeats, as in the network itself.
The determined values of τ, based on the respective training data, are 0.4 for predicting the splitting energy, 0.07 for predicting the HF exchange sensitivity, and 10000 for the metal-ligand distances respectively. The magnitude of these numbers are close to the training errors observed, with √ 0.4 −1 ∼ 2.5,0.4 −1 ∼ 1.6, √ 0.007 −1 ∼ 12,0.4 −1 ∼ 0.01. These numbers represent the estimated inherent variance in the training data that limit of accuracy that could be expected from the trained networks.
Text S3: Use of Coulomb matrix descriptor We compare the descriptors proposed in this work to the Coulomb matrix descriptor, which has previously [22,9] been correlated with various molecular properties for a number of organic molecule data sets. In order to allow comparison of complexes with differing numbers of atoms, we pad all matrices with zeros to a size of of 151 × 151, necessitating O(10 4 ) elements per compound. We sort the rows and columns of the matrices in order to obtain indexing-invariant representations and use KRR with an exponential kernel and the matrix L 1 norm as a distance metric as in Ref [32]. The complexes in our training data range in size from 7 to 151 atoms, but have a mean and median size of 38 and 29 atoms respectively. This large skew toward smaller complexes means that most of the descriptor elements are zero, and this may make learning good model parameters difficult. For example, the L 1 distances between the sorted matrix representations of the small Fe(III)(CN) 6 complex and two large complexes, Fe(III)(tbuc) 6 and Fe(III)(pisc) 6 are very similar (36.85 to 36.88 where the range of distances spans ∼ 20 − 60), despite pisc being a similar strong C-conntecting ligand and tbuc being a much weaker Oconnecting ligand. We train and test on the same data as used in the other methods, but because the Coulomb matrix representation does not encode any functional-dependent information, we also provide a comparison against only B3LYP data (as opposed to varying HFX fractions).

S4
Text S4: Testing ANN performance in molSimplify In order to asses if the ANN can assist automated structure design, we used it to predict bond lengths instead of using the metal-ligand bond length databse integrated into our structure generation toolbox, molSimplify [13]. We selected four of the original benchmark structures where molSimplify was found not to reduce RMS gradient error relative to simple force fields. Further details about the test cases are in the original paper. We project the negative of the energy gradient on the metal and connection atoms at the initial geometry onto the vector joining them, as explained in Figure S32, and use this a measure of how close to an equilibrium bond length the initial geometry is. Note that a negative value for g means the bond would shrink in a steepest descent step, while a positive value means that it would lengthen. Large magnitudes indicate the bonds are far from equilibrium. We achieve reductions in the absolute magnitude of g by 54-90% for bidentate cases and 7% for the monodentate case (Table S26). We note that the reductions in the metal-ligand projected gradient do not necessarily correspond to reductions in the RMS gradient, which considers contributions from all atoms. In the Cr (bipy) 3 case, the RMS gradient is reduced by 30%, but it is unchanged or marginally higher in the other cases. This may be explained by considering the signs of the projected gradient, which show that the ANN universally reduces the metal-ligand bond length relative to original structure. This brings the bidentate ligands closer to the metal center and hence closer to each other, and we observe that the dominant contribution to the RMS gradient is from other atoms in the ligand structure. This could possibly be improved by training a similar ANN on the bite angles.
Text S5: Molecular descriptors for CSD compounds The poor correlation (R 2 = 0.1) between the Tanimoto dissimilarity (for CSD and training ligands) and the prediction error can be understood by considering that the molecular fingerprint is insensitive to the arrangement of groups in the ligand, so two ligands might appear similar in the Tanimoto metric because they both contain certain groups, but this does not ensure that the same groups are coordinating to the metal center. The descriptors used in this work strongly suggest that the immediate metal environment determines behavior of the complex, and so this highlights a specific difficulty in translating established ideas from organic molecular similarity analysis to transition metal systems.
Text S6: Dissimilarity metrics for LS/HS bond length prediction Using the same dissimilarity metrics that were employed to evaluate reliability of spin-state splitting, correlations between HS bond distance error and proximity to test data is smaller for both the HS bond distances (R 2 = 0.0, 0.1 and 0.2 for the Tanimoto similarity metric, Pearson, and Euclidean distances, respectively) and LS bond distances (R 2 = 0 for all metrics). However, we do observe that four of the five large (i.e., > 0.1Å) HS bond distance errors have a minimum Euclidean distance greater than 1.0, supporting the use of this heuristic for evaluating prediction reliability. Bond length errors are generally smaller for LS states compared to HS states, with only two cases (tests 26 and 30) greater than 0.1Å. We observe an overall correlation between the low spin bond distance prediction inaccuracy and poor splitting energy prediction, but bond lengths may still be well-predicted when spin-state splittings are not (e.g., 0.006 − −0.03Åerrors in LS bond distances for the cyclams). Table S1: Ligand properties   Number  ID  Name  Denticity Charge Connection max δχ  Bond Order Truncated Kier  1  cl  chloride  1  1-Cl  0  0  0  2  scn  thiocyanate  1  1-S  0.03  2  2  3  pisc  t-butylphenyl isocyanide  1  0  C  -0.49  3  2.25  4  misc  Methyl isocyanide  1  0  C  -0.49  3  2  5  cn  cyanate  1  1-C  -0.49  3  0  6  co  carbonyl  1  0  C  -0.89  3  0  7  ncs  isothiocyanate  1  tbuc                    in kcal/molHFX (right). Set a includes the metal properties and full ligand identity and number of atoms. Set b replaces ligand identity with the identity of connection atom only, while set c adds information from the sum, maximum and minimum ligand δχ to set b. Set d is the same as set c but excludes the minimum δχ. Set e adds in bond order information with an MSE, while set f replaces the ligand size metric with our truncated index. Set g represents our final set, and includes the same descriptors from f and adds bond order information.  ANN data Figure S6: Model predictions of ∆E H-L and data for Fe(II) (top) and Ni(II) (bottom) using an ANN. The ligands are described by two numbers indicating the equatorial first and then the axial, color coded by ligand identity (green for halogen, gray for carbon, blue for nitrogen, and red for oxygen). The error bars represent an estimated 1 standard deviation from the mean prediction. Abs. test error (kcal/mol) 527 11 Figure S7: Parity plot for 2 standard deviation from the mean prediction and absolute prediction error for test case ∆E H-L prediction using ANN. All units are kcal/mol. The black line is y=x. Figure S8: Normalized error histogram for HF = 0.2 (B3LYP) test data (top) and CSD structures (bottom), comparing ANN, KRR and SVR models using descriptor set g, as well as a KRR model using the Coulomb matrix descriptor (trained on B3LYP data only.)   and data for Co using an ANN. The ligands are described by two numbers indicating the equatorial first and then the axial, color coded by ligand identity (green for halogen, gray for carbon, blue for nitrogen, and red for oxygen). The error bars represent an estimated 1 standard deviation from the mean prediction. and data for Cr using an ANN. The ligands are described by two numbers indicating the equatorial first and then the axial, color coded by ligand identity (green for halogen, gray for carbon, blue for nitrogen, and red for oxygen). The error bars represent an estimated 1 standard deviation from the mean prediction. and data for low-spin Fe(II) (top) and Ni(II) (bottom) using an ANN. The ligands are described by two numbers indicating the equatorial first and then the axial, color coded by ligand identity (green for halogen, gray for carbon, blue for nitrogen, and red for oxygen). The error bars represent an estimated 1 standard deviation from the mean prediction. and data for Mn using an ANN. The ligands are described by two numbers indicating the equatorial first and then the axial, color coded by ligand identity (green for halogen, gray for carbon, blue for nitrogen, and red for oxygen). The error bars represent an estimated 1 standard deviation from the mean prediction.

Co 3+
ANN data Figure S18: Model predictions of R min LS and data for low-spin Co using an ANN. The ligands are described by two numbers indicating the equatorial first and then the axial, color coded by ligand identity (green for halogen, gray for carbon, blue for nitrogen, and red for oxygen). The error bars represent an estimated 1 standard deviation from the mean prediction.

Cr 3+
ANN data Figure S19: Model predictions of R min LS and data for low-spin Cr using an ANN. The ligands are described by two numbers indicating the equatorial first and then the axial, color coded by ligand identity (green for halogen, gray for carbon, blue for nitrogen, and red for oxygen). The error bars represent an estimated 1 standard deviation from the mean prediction.

Ni 2+
ANN data Figure S20: Model predictions of R min LS and data for low-spin Fe(II) (top) and Ni(II) (bottom) using an ANN. The ligands are described by two numbers indicating the equatorial first and then the axial, color coded by ligand identity (green for halogen, gray for carbon, blue for nitrogen, and red for oxygen). The error bars represent an estimated 1 standard deviation from the mean prediction.

Mn 3+
ANN data Figure S21: Model predictions of R min LS and data for low-spin Mn using an ANN. The ligands are described by two numbers indicating the equatorial first and then the axial, color coded by ligand identity (green for halogen, gray for carbon, blue for nitrogen, and red for oxygen). The error bars represent an estimated 1 standard deviation from the mean prediction.

Co 3+
ANN data Figure S24: Model predictions of R min HS and data for high-spin Co using an ANN. The ligands are described by two numbers indicating the equatorial first and then the axial, color coded by ligand identity (green for halogen, gray for carbon, blue for nitrogen, and red for oxygen). The error bars represent an estimated 1 standard deviation from the mean prediction.

Cr 3+
ANN data Figure S25: Model predictions of R min HS and data for high-spin Cr using an ANN. The ligands are described by two numbers indicating the equatorial first and then the axial, color coded by ligand identity (green for halogen, gray for carbon, blue for nitrogen, and red for oxygen). The error bars represent an estimated 1 standard deviation from the mean prediction.

Mn 3+
ANN data Figure S27: Model predictions of R min HS and data for high-spin Mn using an ANN. The ligands are described by two numbers indicating the equatorial first and then the axial, color coded by ligand identity (green for halogen, gray for carbon, blue for nitrogen, and red for oxygen). The error bars represent an estimated 1 standard deviation from the mean prediction.    Figure S28: Illustration showing definition of bond projected gradient, g, used to estimate the closeness of an initial geometry to equilibrium. The projected gradient is the scalar difference between the component of the negative energy gradient projected into the vector joining nuvlear positions of the metal (larger orange circle) and the ligand (smaller grey circle).  Error in ∆E H -L (kcal/mol) Figure S30: Comparison of dissimilarity metrics for CSD data: errors in spin energy predictions for CSD structures are on the y-axis in kcal/mol and the Euclidean (left, red) and uncentered Pearson distances (gray, right) between the CSD structure and its nearest representation in dimensionless descriptor space is shown on the x-axis.  Error in R min LS Å Figure S31: Comparison of dissimilarity metrics for CSD data: errors in LS bond length prediction for CSD structures are shown on the y-axis inÅ, and three normalized dissimilarity metrics are compared on the x-axis: the Tanimoto/FP2 disimilarity metric between the CSD ligands and the training ligands (blue circles), and the Euclidean (red diamonds) and uncentered Pearson distances (gray crosses) between the CSD structure and its nearest representation in dimensionless descriptor space. Error in R min HS Å Figure S32: Comparison of dissimilarity metrics for CSD data: errors in HS bond length prediction for CSD structures are shown on the y-axis inÅ, and three normalized dissimilarity metrics are compared on the x-axis: the Tanimoto/FP2 disimilarity metric between the CSD ligands and the training ligands (blue circles), and the Euclidean (red diamonds) and uncentered Pearson distances (gray crosses) between the CSD structure and its nearest representation in dimensionless descriptor space.