Yuri
Cho
ab,
Ruben
Laplaza
ac,
Sergi
Vela
de and
Clémence
Corminboeuf
*abc
aLaboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland. E-mail: clemence.corminboeuf@epfl.ch
bNational Centre for Computational Design and Discovery of Novel Materials (MARVEL), École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
cNational Centre for Competence in Research-Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
dDepartament de Ciència de Materials i Química Física and IQTCUB, Universitat de Barcelona, Barcelona, Spain
eInstitut de Química Avançada de Catalunya (IQAC-CSIC), Barcelona, Spain
First published on 12th July 2024
Exploiting crystallographic data repositories for large-scale quantum chemical computations requires the rapid and accurate extraction of the molecular structure, charge and spin from the crystallographic information file. Here, we develop a general approach to assign the ground state spin of transition metal complexes, in complement to our previous efforts on determining metal oxidation states and bond order within the cell2mol software. Starting from a database of 31k transition metal complexes extracted from the Cambridge Structural Database with cell2mol, we construct the TM-GSspin dataset, which contains 2063 mononuclear first row transition metal complexes and their computed ground state spins. TM-GSspin is highly diverse in terms of metals, metal oxidation states, coordination geometries, and coordination sphere compositions. Based on TM-GSspin, we identify correlations between structural and electronic features of the complexes and their ground state spins to develop a rule-based spin state assignment model. Leveraging this knowledge, we construct interpretable descriptors and build a statistical model achieving 98% cross-validated accuracy in predicting the ground state spin across the board. Our approach provides a practical way to determine the ground state spin of transition metal complexes directly from crystal structures without additional computations, thus enabling the automated use of crystallographic data for large-scale computations involving transition metal complexes.
Yet, the proper exploitation of crystallographic information for the computational chemistry is not straightforward. For datasets to be adapted for both the training of ML models and high-throughput QC searches, they must include the essential information needed to run an electronic structure computation, such as the structure (R), the molecular charge (Q) as well as the spin multiplicity. Owing to the lack of information about metal oxidation (OS)29 and spin states, reliably retrieving the molecular charge and spin multiplicity is especially difficult when transition metals (TM) are involved. To overcome this limitation, we recently developed cell2mol,22 a software that specifically interprets crystallographic data to retrieve the Cartesian coordinates, total charges, and connectivity of all individual molecules in the unit cell, including the OS of metal ions. While cell2mol provides a thorough unit cell interpretation, the original version was not coded to characterize ground state spins.
Given that the ground state spin of TM complexes depends on multiple factors, such as metal identity, OS, coordination geometry and ligand field strength, deducing this information only from their structure is challenging.30–32 Significant efforts have been made to train ML models that predict spin-state-dependent properties such as spin-splitting energies, spin-state orderings, sensitivity to Hartree–Fock exchange, and metal–ligand bond lengths25,33–35 but these efforts have been essentially placed on mononuclear octahedral complexes and on a restricted range of exemplary ligands with varying field strengths along the spectrochemical series. So far, the prediction of ground state spin of TM complexes has not been investigated across diverse chemical spaces.
Herein, we develop a pragmatic and general workflow (Fig. 1) to predict the ground state spin of TM complexes by leveraging previously curated data obtained with cell2mol. Starting from the original database that was extracted from the CSD, we construct a smaller, representative, albeit diverse set through stratified sampling. We then determine the ground state spin of each individual complex through density functional theory (DFT) computation using the B3LYP* functional36,37 and filter out ambiguous cases. The resulting TM-GSspin dataset is systematically analyzed to identify correlations between the structural and electronic features of the complexes and their ground state spins. Based on the extracted patterns and relationships, we construct rule-based empirical and interpretative random forest models for ground state spin assignment in first row TM complexes. These models are integrated into cell2mol, enabling the assignment of total charge, OS, and ground state spin of TM complexes directly from crystallographic information files.
![]() | ||
Fig. 1 Proposed general workflow. The numbers in bold indicate the number of complexes curated at each step. |
This results in 17214 mononuclear first row complexes with five metal centers: Cr, Mn, Fe, Co, and Ni. Among these complexes, we excluded those with haptic ligands (e.g., cyclopentadienyl) because their coordination numbers and geometries are ambiguous, presenting subtly different η coordination modes.40 We also eliminated complexes with nitrosyl ligands to avoid potential spin from typical non-innocent ligands.41 The coordination geometry of the TM complex was then determined using the CoSymLib python library42 and complexes exhibiting significant deviation from the ideal shape of a reference polyhedron were removed to unequivocally identify the correlation between the ground state spin and the coordination geometry (see Section S1 and Fig. S1 in the ESI†).
The remaining 15837 complexes were classified based on metal identity, OS, coordination number (the number of atoms bound to the metal center), coordination geometry, and composition of the metal-coordinating atoms, resulting in 1633 distinct groups of complexes that share the aforementioned characteristics. We then performed stratified sampling among those groups to construct a dataset of 2261 complexes where each group is represented (see Section S1 in the ESI† for further details regarding dataset construction and curation, and Section S2† for a discussion on the excluded complexes).
An iterative and automated correction was applied in case of convergence failures. Complexes for which computations failed to converge in all accessible spin states were removed. Those with energy gaps between possible spin states smaller than 5 kcal mol−1 were excluded, as they fall below the chemical accuracy of DFT for spin-state energetics of TM complexes.48–50 This filtering step also eliminates spin-crossover complexes, for which ground spin state changes with an external stimulus like temperature or pressure as a result of vibrational and electronic entropy contributions.33,46,51,52 Finally, complexes exhibiting an expectation value of 〈Ŝ2〉 that deviates from the exact value of S(S + 1) by more than 0.1 for the singlet and doublet ground states, and more than 0.2 for the other ground state spins49 were also excluded. Overall, a total of 198 complexes were removed by the three consecutive filters. For further details on the in-depth analysis of the DFT results, excluded complexes, impact of geometry optimization on spin-splitting energies, and complexes with hydride ligands, see Section S2 in the ESI.†
TM centers exhibiting various ground state spins can be understood based on their coordination environments. As an illustrative example, the analysis of the ground state spins of Fe(II) complexes across fourteen different coordination geometries is provided in Fig. 3b d6 Fe(II) complexes adopt singlet, triplet, or quintet ground state, which corresponds to low-spin (LS), intermediate-spin (IS), or high-spin (HS) state, respectively. Most coordination geometries of these complexes exhibit the HS ground state. Specifically, all Fe(II) complexes with coordination numbers smaller than 4 or greater than 6 (i.e., linear, trigonal planar, T-shaped, pentagonal bipyramidal, capped trigonal prismatic, dodecahedral and cube geometries) are consistently HS. Tetrahedral, seesaw, and trigonal prismatic Fe(II) complexes are all HS except for two tetrahedral imido complexes containing tertiary phosphine ligands, which adopt LS states. Fe(II) complexes with other coordination geometries exhibit a greater ground state spin variability. Square planar and trigonal bipyramidal Fe(II) complexes exhibit IS or HS, depending on whether the highest d orbital is empty or half-filled. Square pyramidal Fe(II) complexes cover three different ground state spins, while most common octahedral Fe(II) complexes adopt either the LS or HS ground states.
Similar trends are observed for other TM complexes with d4 to d8 electron configurations (see in Fig. S9–S13 of the ESI†). Complexes with low- or high-coordination geometries tend to favor the HS state within a given d electron configuration. Certain coordination geometries, such as tetrahedral and trigonal prismatic, exclusively adopt the HS states due to the small d-orbital splitting in these crystal fields. However, a few tetrahedral complexes favor the LS ground state owing to the presence of strong-field ligands such as substituted phosphines. Complexes with other coordination geometries exhibit different ground state spins depending on the arrangement of d electrons within a given crystal field, which indicates that considering additional factors is crucial for determining the ground state spin. For further analysis, the relationship between ground state spins and coordination sphere compositions of 144 Fe(II) octahedral complexes are shown in Fig. S14 in the ESI,† with a brief discussion.
We further examine the distribution of distances between the metal center and its coordinating atoms as the metal–ligand bond lengths in HS states are generally longer compared to those in the LS states due to the population of anti-bonding orbitals. Within this context, Taylor et al.,25 assigned ground state spins of mononuclear octahedral Fe(II)/Fe(III) complexes based on heuristic cut-off values for metal–ligand bond lengths. We here introduce a more general indicator applicable to various metal centers and coordination geometries. We define the relative metal radius, denoted as rrel, as
![]() | (1) |
Fig. 3c shows the distribution of relative metal radii and their corresponding ground state spins for Fe(II) complexes depending on the coordination geometry (see Fig. S15 in the ESI† for Fe(III) complexes). Across different coordination geometries, Fe(II) complexes in the HS state typically possess longer relative metal radii compared to those in the LS or IS states. In general, we observe a biased distribution toward HS states with longer relative metal radii. In particular, the relative metal radii of octahedral Fe(II) complexes show a binomial distribution, separating LS and HS. Interestingly, two tetrahedral Fe(II) complexes with LS state exhibit very short relative metal radii, which deviate from the overall distribution of relative metal radii in tetrahedral complexes. This suggests that relative metal radius serves to identify outliers exhibiting uncommon ground state spins. For further analysis, we investigate one octahedral singlet complex (CSD refcode: DOQRAC) with longer relative metal radii in Fig. 3c, which was identified as a spin-crossover complex in the literature.56 Moreover, there is a systematic increase in relative metal radius as the coordination number increases. This pattern is especially evident in HS complexes, which display a linear increase, as shown in Fig. S16 in the ESI.†
![]() | ||
Fig. 4 Ground state spin assignment of first row TM complexes based on empirical rules. Nd: the number of d electrons, with odd numbers of d electrons shown in parentheses, OS: metal oxidation state, CN: coordination number, rrel: relative metal radius, cut-off: predefined cut-off value used to distinguish the lowest spin state. Decision nodes in orange use cut-off values that are specific for a given metal and coordination geometry (see Table S10 in the ESI†). |
For complexes with a coordination number of 2 or greater than 6, the ground state spin is assigned as HS based on observed trends. We hypothesize that in such low- or high-coordination cases, the ligand field is weak due to the limited interaction between the metal and ligands. This stems from either sterically hindered bulky ligands in the former case (Table S9 in the ESI†) or overly crowded coordination spheres in the latter. Accordingly, both extremes favor the HS ground state.
For the remaining complexes, the assignment depends on their coordination geometry and relative metal radius. For cases with multiple ground state spins within a given coordination geometry, the model uses the relative metal radius as a distinguishing criterion (indicated by the orange rhombus in Fig. 4). For this step, we define a specific cut-off for each combination of metal and coordination geometry (Table S10 in the ESI†). If the relative metal radius falls below the designated cut-off value, the ground state spin is assigned as the lowest possible spin state. Note that for d4 octahedral as well as d6 square planar or trigonal bipyramidal complexes, the lowest spin state is assigned as triplet based on the pattern discerned in the dataset.
Despite its relative simplicity, the empirical model achieves a high 97% accuracy within the dataset. Out of the 2063 complexes considered, only 55 exhibit discrepancies in their ground state spin assignments with respect to the computations. Most of the disagreements occur for square pyramidal complexes featuring Fe(III), Co(II), and Ni(II) with relative metal radii close to the cut-off values. Furthermore, our empirical model is indeed unable to assign the IS state to square pyramidal Fe complexes, which would require an additional cut-off value.
![]() | ||
Fig. 5 Ground state spin prediction for the TM-GSspin dataset and for each metal subset. (a) 10-Fold cross-validated accuracy of random forest models using different features. FTM is a vector containing the metal atomic number (Z), metal OS, and the number of d electrons (Nd). FCE contains coordination number (CN), coordination geometry (CG), and relative metal radius (rrel). FTM+CE combines both FTM and FCE. aSLATM is the atomic SLATM representation57 of the metal atom. FTM+CE+aSLATM incorporates FTM+CE and aSLATM. (b) Feature importances in prediction models trained using FTM+CE. |
Fig. 5a shows the 10-fold cross-validated accuracy for each random forest model, with error bars representing the standard deviation across folds. Amongst the models trained on the entire TM-GSspin dataset (Fig. 5a, leftmost), the model using FTM+CE achieves the highest cross-validated accuracy, reaching 98%. In contrast, the model with the FCE features exhibits the poorest performance due to the lack of metal center information. FCE fails to capture the intricate relationship between ground state spin and coordination geometry, which varies depending on the TM and its OS. For comparison, the model employing aSLATM outperforms the one based on FCE, benefiting from the inclusion of nuclear charge information for the metal atom. Interestingly, the FTM+CE features lead to a more accurate model than aSLATM despite its simplicity. This superiority is attributed to FTM+CE explicitly containing electronic information, such as the number of d electrons and metal OS—critical factors closely linked to the ground state spin of the complex, which are not explicitly captured by many-body potential terms. When considering standard deviations, the model using FTM+CE displays an accuracy similar to the model using the much larger FTM+CE+aSLATM (vector size 6 vs. 80591) while bypassing the computational cost of generating the aSLATM representation. Overall, these results underscore the effectiveness of FTM+CE in capturing both the electronic and structural information of TM complexes, crucial for determining their ground state spin. Separately, we performed dimensionality reduction on the aSLATM by using principal component analysis to reduce the number of features to 100. The reduced aSLATM resulted in slightly worse performance compared to the original aSLATM, as shown in Table S12 in the ESI.†
The performance of the models trained on the individual TM subset are also shown in Fig. 5a. The models using the FTM features exhibit high accuracy for Cr and Mn complexes, owing to the strong relationship between the d electron configuration and the ground state spin for these elements. Conversely, for Fe, Co, and Ni complexes, using only FCE outperforms the FTM models, which is especially evident for the Ni complexes. The reason for the latter is that the majority of Ni complexes in the dataset are either Ni(II) square planar or octahedral complexes, consistently displaying a singlet or triplet ground state, respectively. For these individual metal subsets, models using FCE are comparable or even more accurate than those using aSLATM, contrasting with the trends obtained for the corresponding models trained on the entire TM-GSspin dataset. This distinction arises because the relationship between the ground state spin and the coordination environment is well-defined within each metal subset but not on the overall dataset, in which each metal exhibits different preferences. Ultimately, the best performance obtained on the individual subsets is consistently achieved for the models employing FTM+CE.
To shed light on the relevance of the various features, we finally examine the feature importance derived from the FTM+CE random forest models, as shown in Fig. 5b. Overall, the relative metal radius (rrel) and the number of d electrons (Nd) emerge as the most influential factors. In agreement with our previous observations (vide supra), predictions for Cr and Mn complexes primarily rely on features associated with the metal center, while predictions for Ni complexes are driven by geometric information. Nevertheless, even in those cases, the incorporation of both electronic and structural descriptors remains crucial to predict the ground state spins of TM complexes in both individual metal subsets or full dataset. For more comprehensive information regarding the performance of random forest models, a detailed list of complexes with incorrect predictions, and the analysis of feature importances in models trained using FTM+CE+aSLATM, see Tables S13, S14, and Fig. S18 in the ESI.†
The analysis of TM-GSspin uncovered correlations between ground state spins and the various features of TM complexes (e.g., metal OS, the number of d electrons, coordination number, geometry, and the relative metal radius measure introduced herein). While most of the relationships were already established, we quantified their validity across the board and exploited them to build rule-based decision trees to assign the ground state spin for first row TM complexes. The most relevant features were also used as inputs of random forest models, achieving an impressive 98% cross-validated accuracy within the dataset.
These models are fully integrated into the latest version of cell2mol which is now capable of determining the total charge, the OS, and the ground state spin of TM complexes directly from crystallographic data. This work streamlines automation of electronic structure workflows of molecules extracted from crystal structure repositories.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d4dd00093e |
This journal is © The Royal Society of Chemistry 2024 |