Adarsh V. Kalikadiena,
Niels J. van der Lema,
Cecile Valsecchib,
Laurent Lefort
c and
Evgeny A. Pidko
*a
aInorganic Systems Engineering, Department of Chemical Engineering, Faculty of Applied Sciences, Delft University of Technology, Van der Maasweg 9, Delft, 2629 HZ, The Netherlands. E-mail: e.a.pidko@tudelft.nl
bDiscovery, Product Development and Supply, Janssen Cilag S.p.A., Viale Fulvio Testi, 280/6, 20126 Milano, Italy
cDiscovery, Product Development and Supply, Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340 Beerse, Belgium. E-mail: llefort@its.jnj.com
First published on 24th June 2025
Computational exploration of chemical space is a powerful tool for designing organometallic homogeneous catalysts. While catalytic properties depend on ligand properties and spatial arrangement, the role of stereoisomerism in defining catalyst selectivity and reactivity has only been elucidated sporadically, leaving gaps in virtual screening workflows. This study investigates the necessity of exploring ligand configurations for virtual high-throughput (HT) screening of octahedral transition metal complexes. Using automated workflows, ligand configuration ensembles were generated for bisphosphine ligands with Ir(III), Ru(II), and Mn(I) metal centers. DFT calculations revealed distinct preferences for Ir(III) configurations, whereas Mn(I)- and Ru(II)-complexes displayed significant fluxionality, with multiple configurations within a 10 kJ mol−1 energy range. Linear regression analyses showed that global descriptors, such as bite angle and HOMO–LUMO gap, are transferable across configurations and metal centers, while local steric descriptors lacked such transferability. Machine learning (ML) models successfully classified ligand configurations (balanced accuracy >0.8) but struggled to predict stability across metal centers, especially for Mn(I) and Ru(II). Thus, improved descriptors of the first coordination sphere to capture fluxionality and stability more effectively can improve ML models. Overall, this study underscores the limitations of ignoring stereoisomerism in virtual HT screening, which may lead to incomplete exploration of chemical space and underrepresentation of key catalyst features. Until dynamic digital representations are developed, exhaustive stereoisomerism exploration should be implemented for screening workflows.
Generally, these workflows start with assumptions based on a specific reaction mechanism.13 Considering the flexibility of these ligands, one thus assumes that a preferred ligand arrangement is retained for all members of a given ligand family and/or metal centers. Conformational search aiming at identifying low-energy rotamers and isomeric structures is commonly carried out for this selected coordination polyhedron with the pre-defined ligand configuration, which is preserved at this stage.28,29 Although the kinetic trans-/cis-effect is well known,30 the role of stereoisomerism of the catalyst in defining selectivity and reactivity has only been elucidated sporadically.31–33
The relationship between the stability and observed catalytic properties of a complex is challenging to comprehend and is usually not known a priori. Consider a scenario where a meta-stable configuration of a TM complex, existing at a low concentration in the reactive system, establishes a favorable reaction channel. This minor catalytic component would provide a major impact on the reaction rate and would therefore determine the nature and characteristics of the primary reaction product (Fig. 1).34–40 As an example, two possible ligand arrangements are depicted in Fig. 1a and b. Although complexes with a different ligand configuration are in equilibrium, one meta-stable configuration may provide a reaction path with an energy barrier that is significantly higher than that of the other configuration. The overall ensemble of ligand configurations ultimately contributes to the observed catalytic properties. In the context of virtual HT screening, the conformational isomerism of the organic ligand backbone has been recognized by the community27,41–43 and various structure generation tools such as AARON, Architector, Molsimplify, Molassembler and AQME, contain on-the-fly conformer generation solutions.44–48 However, due to the initial selection of a configuration, the influence and contribution of the metastable configurations featuring varied coordination environments and ligand arrangements might be overlooked. Furthermore, this choice assumes that the preferred configuration does not change with relatively minor variations in the ligand structure and, often, even the nature of the metal center. Consequently, the question arises: can catalytic systems be fully accounted for when part of the chemical space is neglected due to initial human choices and intuition in structure generation?
To investigate this, we focused on TM-complexes that are relevant to homogeneous catalysis for hydrogenation reactions, where bidentate ligands are commonly employed to achieve high reactivity and enantioselectivity.49–52 To ensure that the generated data is as bias-free and comprehensive as possible, we employed an automated workflow for construction, sorting and descriptor calculation of ensembles of ligand configurations for TM-complexes. We constructed ensembles containing different ligand configurations for 87 bidentate ligands connected to 3 different metal centers, namely Ir(III), Ru(II) and Mn(I), yielding a total of 908 octahedral TM-complexes. With these data, we set out to model the relations between stereoisomerism, stability, and descriptors.
The primary research question of this work is whether an exhaustive exploration of stereoisomerism is necessitated for virtual HT screening of octahedral TM-based catalyst complexes, given that the degree of configurational fluxionality of a complex is not fully known a priori. To answer this, we investigated whether specific ligand configurations of the TM-complexes proved to be more energetically favorable and whether this could be modeled using physical–chemical descriptors and machine learning (ML). The paper is organized as follows. Initially, stability trends of different ligand configurations were analyzed on the basis of results from Density Functional Theory (DFT) calculations. The results were analyzed by means of linear regressions to identify relevant descriptors across different ligand configurations and metal centers. The descriptors were then utilized to construct ML models capable of distinguishing different types of ligand configurations and predicting energetic preferences for specific metal–ligand combinations. Our results highlight the challenge of configurational fluxionality for the virtual screening of TM complexes and provide practical directions to address them.
After generating the structures for the TM-complexes via MACE, the complexes were named according to the first donor atoms axially bonded to the metal center as illustrated in Fig. 2. These axial ligand configurations were identified using bite angles. The coordinate system of TM-complexes are defined with respect to the bidentate ligand, and hence the bidentate ligand is always present in the equatorial position. Therefore, the ligands in the axial position are the only non-bidentate ligand containing pair forming a bite angle of 180°. For ligands lacking C2 symmetry, variations arising from asymmetric functionalization on the phosphine donor atoms were treated as distinct conformers within the same axial ligand configuration. After generating and sorting the TM-complexes, geometries were optimized at the DFT level of theory.
Thermodynamic stabilities of ligand configurations were calculated by the difference in the DFT-based electronic energies with respect to a reference configuration. The H–N axial ligand pair structure is used as the reference, being the only common configuration present among studied metal centers. The difference in stability between the reference and alternative configurations is denoted as ΔEref.
In addition to screening the stability of the TM-complexes, a screening based on substrate binding energy was also conducted. Binding energies of the model substrate, acetonitrile, were computed as follows:
Ebind = EDFT,complex − (EDFT,complex-nosub + EDFT,sub) | (1) |
The random forest (RF) and logistic regression (LR) algorithms were used. RF is an ensemble learning algorithm harnessing multiple decision trees and randomness to construct a predictive model, while logistic regression is a statistical method that models the probability of a binary outcome using a logistic function. In our study, all modeling tasks were attempted with both RF and logistic regression, with logistic regression serving as a simpler alternative to RF. The modeling tasks were evaluated with a balanced accuracy (BA) score which is a metric for evaluating classification models on imbalanced datasets. The score was calculated as follows:
![]() | (2) |
Details about cross-validation, hyperparameter optimization and model initialization can be found in the ESI Section S3.†
The results for Ir(III) complexes (Fig. 4a) reveal that for most ligands the reference N–H axial ligand pair is more stable than the alternative H–H arrangement. In the case of Ru(II) (Fig. 4b), the presence of additional auxiliary ligands expands the configurational space, which now includes C–N, C–H and H–H as alternative axial ligand pairs next to the reference H–N. Upon assumption that hydrides are indistinguishable, Ir(III) complexes can only form two distinctive configurations, whereas four different configurations of the Ru complex can be formed. A significant variety in the data set is observed, as the complex configuration with the lowest stability often varies among the different bidentate ligands.
Similar to Ir(III), the H–H axial ligand configuration exhibits a positive ΔEref for the majority of bidentate ligands, signifying a lower stability compared to the reference H–N configuration. However, a notable difference from Ir(III) complexes is that alternative configurations exhibit higher stability for many systems. For instance, the C–N axial configuration commonly exhibits a negative ΔEref, indicating their higher stability than the H–N reference. Furthermore, our workflow identifies multiple Ru complexes with alternative configurations varying more than 50 kJ mol−1 compared to the reference case. These outliers are the result of unfavorable conformations imposed by the specific ligand arrangement on the metal center. In particular, 6 ligands were identified as outlier for multiple metal centers (L86, L87, L119, L134 and L171), but no noteworthy trends were observed. Data on these outliers are contained in the data_analysis and descriptor_analysis directory in the ESI.†
A similar analysis for Mn(I) complexes (Fig. 4) reveals that the most stable preferred configuration varies between different bidentate ligands. Distinctive to Mn(I) complexes is the C–C configuration which shows a lower overall stability for most ligands. Nevertheless, as opposed to Ir(III) complexes, the reference H–N axial ligand configuration is not shown to be the most stable configuration in all cases. Instead, the C–H and C–N configurations are energetically more favorable.
Fig. 5 summarizes the axial ligand configurations along with the percentage of bidentate ligands for which those specific configurations are found to be the global minimum on the PES. For the Ir(III) complexes 92% of bidentate ligands show a clear global minimum in energy for the H–N configuration. The remaining 8% favor the single alternative H–H configuration. For Ru(II) complexes, the H–N axial configuration is also frequently identified as the global minimum, but this now only accounts for 50% of the bidentate ligands. Both the C–N and C–H axial ligand configurations emerge as the global minimum for a notable number of bidentate ligands, 31% and 18% respectively. The H–H axial ligand configuration is the global minimum for a single bidentate ligand. Unlike Ru(II) and Ir(III), related Mn(I) complexes do not display a pronounced majority of minima containing the H–N configuration. This geometry is preferred for only 26% of Mn(I) complexes, while the alternative C–H configuration is the global minimum for 45% of the bidentate ligands in this case. The C–N and C–C axial arrangement are preferred by 24% and 5% of the Mn(I) complexes respectively. These findings underscore that even though bidentate bisphosphine ligands are studied exclusively, no clear trend in the stability of a ligand configuration can be observed across the studied metal centers.
![]() | ||
Fig. 5 Distribution of most stable ligand configuration over all possible ligand configurations for Mn(I), Ru(II) and Ir(III) complexes, alongside 87 bisphosphine (PP) bidentate ligands. |
Linear regression models were constructed to predict specific descriptors of the complexes across different combinations of metals and ligand configurations. The models are scored using a coefficient of determination (R2) ranging from 0 to 1. Since there are 10 possible metal and ligand configuration combinations, the performance of (10 × 10) − 10 = 90 distinct linear models per descriptor is reported. Fig. 6 shows a heatmap for four selected descriptors and the resulting R2 for the 100 models, the steric descriptors are depicted on the left and the electronic descriptors on the right. The top two descriptors in the figure represent local properties of the bidentate ligand, while the bottom two are more global. Each heatmap shows all possible ‘metal, configuration’ combinations on the x- and y-axes. Similar heatmaps for all calculated descriptors are provided in the ESI (Section S4†).
![]() | ||
Fig. 6 Matrices for R2 scores of linear models between specific descriptor from one set of bidentate ligands with a specific metal and ligand configuration to another set with a different combination of metal and ligand configuration. An example is shown for four selected descriptors that range in locality. An image for matrices of all calculated descriptors can be found in the ESI Section S4.† |
Fig. 6 reveals a clear distinction in the transferability of the steric descriptors. The calculated physical–chemical descriptors differ in the level of locality that is captured, where a buried volume can be separated into quadrant- and octant-based contributions which offer a local view on the steric occupancy of a ligand, the bite angle remains a more global geometric measure. Although a local octant of the buried volume shows a low R2 (R2 < 0.5) for models across all metal and ligand configuration combinations, the bite angle shows a relatively high R2 (R2 > 0.7) across all metal and ligand configuration combinations. This is in line with the inherent low variance (33.5°) of the bite angle across the whole dataset. Similar observations are made and reported in the ESI Section S4† for the percentage buried volume with a radius of 3.5 at the metal center or ligand donor atoms.
For the electronic descriptors shown on the right in Fig. 6, the distinction in transferability is less pronounced. This is evidenced by the presence of red regions in the heatmap of the NBO charge, which deviates from the uniform blue observed in the local steric descriptors. The NBO charge at an atom describes the local electronic environment of the specified atom, while the HOMO–LUMO gap remains a global descriptor depicting the difference in energy of the frontier orbitals of the whole complex. The NBO charge at the ligand donor atom labeled ‘max’ shows varying modeling performance. Starting at the top left of the heatmap, the ‘Ir, H–H’ metal and configuration combination shows no transferability across any other combination. The ‘Ir H–N’ metal and configuration combination shows moderate R2 (R2 ≈ 0.6) for Ru(II)-based C–N and H–N configurations while a low R2 is observed for all other combinations. On the bottom right, the Ru(II)-based configurations show a relatively moderate to high R2 across other Ru(II)-based configurations. For Mn(I)-based configurations, the trend differs, as moderate to high R2 values are observed only between the C–C, C–H, and C–N configurations. This highlights the sensitivity of certain descriptors to stereoisomerism and the nature of the metal center in the TM complex. In contrast, the HOMO–LUMO gap consistently exhibits high R2 values (R2 > 0.7) across all metal and ligand configuration combinations. Although the HOMO–LUMO gap itself seems transferable across metal and ligand configuration combinations, visualization of the frontier orbitals showed that the nature of the respective frontier orbitals may substantially differ with varied configurations. This analysis can be found in the ESI Section S5.†
The performance evaluation is shown in Fig. 7, where the x-axis depicts whether modeling was performed on the dataset comprising all metals or by metal-specific division. The BA on the test set for RF and LR are shown by a red and blue bar respectively. The modeling on the dataset containing all ligands, metals and ligand configurations reveals a gap in performance between the non-linear RF and the linear LR models. Where the RF models yielded a remarkable BA of 0.87–0.89, the LR models yielded a good BA of 0.73–0.79. Inspecting the performance of metal-specific models going towards the right in the figure, it can be observed that although all models perform good to excellent, a drop in performance is observed for Mn(I)-specific modeling. Nevertheless, these results suggest that the descriptors employed allow ML to effectively distinguish between different axial configurations. This holds true even for out-of-domain modeling cases where 16 ligands were kept out of the training set, simulating a case of applying the trained models to fully new ligands. The out-of-domain modeling results are contained in ESI Section S7.†
An examination of the feature importances (see ESI Section S6†) revealed that for modeling over all metal centers, descriptors such as the dipole moment, NBO charges on the metal or bidentate ligand donor atoms, and distances between the donor atoms and the metal center are of the highest importance. Thus, these importances reveal that the polarity of the complex (dipole moment) and the local electronic environment surrounding the metal and ligand donor atoms (all other mentioned descriptors) are informative enough to distinguish different ligand configurations for ML. This is in line with our findings on the transferability of descriptors, where those same descriptors are observed to exhibit a high sensitivity to changes in ligand configuration and metal center. However, it should be noted that an important difference is observed in the feature importances of Mn(I)-specific models. Where high importance is given in Ru(II)- and Ir(III)-specific models to the dipole moment and afterwards mainly descriptors of the local electronic environment surrounding the metal center, these seem of relatively lower importance in Mn(I)-specific models. Here, a higher importance is observed for more global descriptors such as the bite angle, cone angle and HOMO–LUMO gap. This observation points at a difference in which ML is able to distinguish ligand configurations of 3d TM-complexes compared to their 4d counterparts and is indicative of the gap in performance of Mn(I)-specific models compared to Ru(II)- and Ir(III)-specific models.
Fig. 8a reports the number of ligand configurations within the 10 kJ mol−1 energy range from the most stable configuration for the Ir(III), Ru(II) and Mn(I) complexes, while the fraction of the respective complexes featuring multiple ligand configurations within this energy range is given in Fig. 8b. For the majority of Ir(III) complexes, only a single configuration is observed within the specified energy range. This finding is in line with the significant stability differences and small number of available ligand configurations. However, even in this case, 24% of Ir(III) complexes are expected to exhibit substantial structural isomerism, i.e. present multiple ligand configurations with stability difference <10 kJ mol−1, under the reaction conditions. The fraction of such systems is much higher for Ru(II) and Mn(I) complexes, where multiple ligand configurations within 10 kJ mol−1 stability range were found for 72% and 68% of the cases, respectively.
To enable machine learning models to classify ligand configurations based on their relative stability, we treated all configurations within 10 kJ mol−1 of the most stable structure as a single class. This threshold reflects a design choice based on the assumption that such configurations are thermally accessible and thus potentially relevant under catalytic conditions. Similar to the previous modeling approach, a binary classifier was trained either on the whole dataset comprising all ligands and metals or metal-specific by dividing the dataset metal-wise. This leads to a binary classification where the model has to predict whether a ligand configuration is within the stability range of 10 kJ mol−1. Again, both the random forest and logistic regression algorithms were utilized.
Performance evaluation is shown in Fig. 9, where the x-axis depicts whether the modeling was performed on the dataset that includes all metals or by metal-specific division. The BA on the test set for RF and LR are again shown by a red and blue bar respectively. All results, except for Ir(III)-specific models, reveal a gap in performance between RF and LR models. Where the RF models on the dataset of all metal centers and ligand configurations yield a moderate BA of 0.69–0.74, the LR models yielded a worse BA of 0.60–0.68. Inspecting the performance of metal-specific models going towards the right in the figure, it can be observed that although all models perform moderately, again a drop in performance is observed for Mn(I)-specific modeling. Additionally, the performance of Ir(III)-specific models has a large range in BA of 0.21 for both RF and LR. Since only 24% of Ir(III) complexes are expected to exhibit substantial structural isomerism, the variation in performance depends on whether and how many, of these exceptions are present in the test set. These results suggest that utilizing these descriptors for modeling the stability of ligand configurations is only moderately possible with RF models. The modeling performance is exacerbated in the out-of-domain modeling approach, where a performance drop in the BA is observed for all RF models (see ESI Section S9†). The performance or LR models remained similar in the out-of-domain modeling. Nevertheless, in both RF and LR modelling approaches for all cases except Ir(III)-specific models, this performance points at a modeling ability that is only marginally better than random selection for predicting the stability of a fully unseen ligand.
Given that modeling of the stability was only performing sufficiently for Ir(III)-specific models, the feature importances (see ESI Section S8†) of these models give an insight into which descriptors are strongly linked to stability. The high standard deviations in the feature importance of LR models for Ir(III) make interpretations non-trivial. Nevertheless, the feature importance of RF models reveal that the same descriptors that enabled the modeling of different ligand configurations, which capture the polarity of the complex and the local electronic environment surrounding the metal and ligand donor atoms, are now also of high importance. However, since these descriptors only moderately allow for in-domain modeling and do not allow the reliable out-of-domain modeling of the stability of configurations for Ru(II) and Mn(I), the universality of these descriptors can be questioned.
Using our automated workflows, ensembles of possible ligand configurations were generated for a library of bisphosphine bidentate ligands with Ir(III), Ru(II) and Mn(I) metal centers. For the study of stability-based preferences in ligand configuration, our findings based on DFT calculations revealed that Ir-complexes displayed a clear preference in ligand configuration, whereas Mn(I)- and Ru(II)-complexes lacked this preference. Thus, it can be concluded that it is incorrect to assume a particularly fixed ligand configuration as the most stable one across these metal centers.
Investigating the transferability of physical–chemical descriptors across ligand configurations and metal centers revealed that local steric descriptors such as the octant contribution of the buried volume are hardly transferable across metal centers or even ligand configurations with the same metal center. However, local electronic descriptors such as the NBO charge on donor atoms of the ligand exhibited transferability between varying ligand configurations and the same metal center. More global steric, electronic and geometric descriptors, such as the bite angle, HOMO–LUMO gap, indicated a high degree of transferability between all metal centers and ligand configurations. These findings emphasized that the exploration of stereoisomerism in virtual HT screening is of importance if local descriptors are of interest to the screening task at hand.
Since the descriptor set was sensitive to variations in ligand configurations, they could prove useful in modeling the energetic preference of ligand configurations. Hence, it was first established whether the descriptor set allowed ML to distinguish between ligand configurations. Based on our results, where a BA of >0.8 for RF models on the dataset containing all metal centers was achieved, it can be concluded that the employed descriptors and non-linear models allow for effective out-of-domain modeling where 16 ligands were kept out of the training set. In a case where descriptors of completely new bidentate ligands are given to the trained ML model, it is thus able to effectively predict its axial ligand configuration pair. However, metal-specific models underline challenges in the potential applications to 3d TM-complexes since a performance drop was observed for Mn(I)-specific models.
For a majority of Mn(I)- and Ru(II)-complexes, multiple ligand configurations were found within a 10 kJ mol−1 energy range from the most favorable one, indicating that multiple ligand configurations may coexist under reaction conditions typically employed in homogeneous catalysis, all influencing catalyst properties. Although a single configuration is predominantly observed within this energy range for most Ir(III)-complexes, a significant portion (24%) is shown to still exhibit substantial structural isomerism. To model the stability of ligand configurations, all ligand configurations with a stability difference of lower than 10 kJ mol−1 within an ensemble were thus treated as equal and indistinguishable. The modeling attempts proved to be only marginally better than random selection for predicting whether the configuration of a fully unseen ligand would fall within the 10 kJ mol−1 stability range. Since these descriptors only moderately allow for in-domain modeling and do not allow the out-of-domain modeling of the stability of configurations for specific models of configurations with a Ru(II) and Mn(I) metal center, it is concluded that these descriptors are not universally applicable across metal centers to model the stability of ligand configurations. Since the feature importances of Ir(III)-specific models, where modeling was successful, indicate that the local environment of the metal center and ligand donor atoms hold the highest importance, there is a large potential for representations containing improved descriptors of the first coordination sphere surrounding the metal center.
Overall, our findings are significant for the virtual high-throughput screening of homogeneous catalysts, which remains heavily reliant on human decision making. Our results demonstrate that focusing on a single ligand configuration during this process may lead to insufficient coverage of the chemical space and an inadequate representation of key catalyst features, thereby limiting the predictive power of in silico catalyst screening campaigns. Furthermore, understanding the flexibility and fluxionality of novel metal–ligand combinations a priori is important for accurate statistical modeling, yet this information is often unavailable beforehand. The modeling approaches described in this study rely on descriptors of individual ligand configurations, creating a ‘chicken-and-egg’ problem: the flexibility and fluxionality are unknown a priori, yet without accounting for them, it remains unclear how comprehensively they should be explored in the digital representation of catalysts. This underscores the current absence of dynamic digital representations in screening workflows. Hence, screening campaigns should prioritize an exhaustive exploration of stereoisomerism when assessing properties sensitive to structural flexibility and fluxionality.
All ESI† and datasets used in this study are provided along with an extensive README via 4TU.ResearchData at https://doi.org/10.4121/216555e8-5f8b-48a0-b92d-9c08505ceacd.
• A list and visualization of ligands (‘ligand_list.pdf’).
• An Excel file categorizing and describing all descriptors (‘descriptors_overview.xlsx’).
• A directory containing the version of OBeLiX used, alongside Python scripts for structure generation and manipulation (‘code.zip’).
• A directory with DFT data, including xyz, log, and where applicable Gaussian .chk files (‘dft_data.zip’).
• A directory with Excel files of DFT results for each ligand configuration and a Jupyter notebook for stability analysis (‘data_analysis.zip’).
• A directory with Excel files containing all descriptors for all generated complexes (‘descriptor_data.zip’).
• A directory with descriptor, energy, and angle data for all studied complexes, alongside scripts and data for ML analysis (‘descriptor_analysis.zip’).
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d5dd00093a |
This journal is © The Royal Society of Chemistry 2025 |