Accurate and rapid prediction of p K a of transition metal complexes: semiempirical quantum chemistry with a data-augmented approach †

Rapid and accurate prediction of reactivity descriptors of transition metal (TM) complexes is a major challenge for contemporary quantum chemistry. The recently-developed GFN2- x TB method based on the density functional tight-binding theory (DFT-B) is suitable for high-throughput calculation of geometries and thermochemistry for TM complexes albeit with moderate accuracy. Herein we present a data-augmented approach to improve substantially the accuracy of the GFN2- x TB method for the prediction of thermochemical properties using p K a values of TM hydrides as a representative model example. We constructed a comprehensive database for ca. 200 TM hydride complexes featuring the experimentally measured p K a values as well as the GFN2- x TB-optimized geometries and various computed electronic and energetic descriptors. The GFN2- x TB results were further refined and validated by DFT calculations with the hybrid PBE0 functional. Our results show that although the GFN2- x TB performs well in most cases, it fails to adequately describe TM complexes featuring multicarbonyl and multihydride ligand environments. The dataset was analyzed with the ordinary least squares (OLS) fitting and was used to construct an automated machine learning (AutoML) approach for the rapid estimation of p K a of TM hydride complexes. The results obtained show a high predictive power of the very fast AutoML model (RMSE B 2.7) comparable to that of the much slower DFT calculations (RMSE B 3). The presented data-augmented quantum chemistry-based approach is promising for high-throughput computational screening workflows of homogeneous TM-based catalysts.


Introduction
Proton transfer reactions are ubiquitous in chemistry. The propensity of proton transfer from a chemical species is related to its acidity constant (pK a ). In the context of the transition metal (TM) complexes, pK a has a direct relevance to their (bio)chemical activity and stability. In homogeneously catalysed (de)hydrogenation reactions such as the hydrogenation of CO 2 to formates/formic acid 1 and dehydrogenation of aqueous methanol, 2 the pK a of TM-based catalysts has been recognized as an important design parameter. For example, the pK a of a TM hydride determines the strength of an acid necessary for the H 2 evolution. 3 Loss or gain of protons can open up undesirable conversion paths or even initiate the decomposition and/or deactivation of the catalyst.
Accurate estimation of the thermodynamic properties such as the pK a of TM complexes is a major challenge for quantum theoretical methods. Computational methods for rapid and accurate screening of such thermodynamic properties are highly desirable. Density functional theory (DFT) has been extensively applied to estimate thermodynamic properties of TM complexes. 4,5 However, the DFT based prediction workflows commonly face major challenges with respect to the accuracy of the calculations (basis set; XC functional; solvation model) and the computational costs. The accuracy of the method in DFT towards prediction of thermochemical properties can be addressed by validation against the experimental data. However, the computational cost for predicting molecular geometries and thermochemical properties remains an important challenge; in particular, when the applications in high-throughput computational screening are sought for. Despite the advances in software and hardware architectures, DFT-based calculations for moderately sized TM complexes (450 atoms) can take several hours to complete in most cases on a modern supercomputer. Furthermore, the electronic structure of TM complexes, particularly for the 3d metals, is a major challenge for DFT. 6 The cost and accuracy of DFT makes it challenging for its direct use in high throughput (HT) computational screening of TM complexes.
Data-driven or semiempirical quantum chemical approaches can be used to circumvent the low throughput of DFT for predicting geometries and thermochemical properties. [7][8][9][10][11] Recently a GFN2-xTB method (the latest one from the GFN(n) family), based on the density functional tight-binding approach has been introduced for the rapid prediction of geometry and thermochemical properties of TM complexes. 12 However, because of its semiempirical nature, the accuracy of the GFN2-xTB is fundamentally limited by the thermochemical span of the training set of molecules and the level of theory used in the parametrization. We propose that the accuracy of the GFN2-xTB method can be improved using machine learning of a target chemical property such as the pK a values (Fig. 1).
Density functional theory (DFT) calculations have been successfully applied to estimate the pK a of diverse classes of molecules. 2,[13][14][15][16] However, fewer studies have been carried out to compute the pK a of TM complexes. Previously, DFT was successfully used to compute the pK a of hexa-aqua TM complexes because of its relevance to the biochemical activity of TM cations. [17][18][19][20] Qi and co-workers used an ONIOM-based approach to estimate the experimental pK a of TM hydrides. 21 Muñoz and co-workers reported a theoretical approach to estimate the pK a of biologically relevant pyridoxamine-Cu(II) complexes. 22 Recently Cundari and co-workers 23 applied DFT calculations to estimate the pK a of methane adducts of 3d TM complexes.
Accurate treatment of solvent effects, especially in a protic and hydrogen bonding environment often pose a major challenge for the reliable computation of pK a values. Ab initio molecular dynamics (AIMD) simulations with a fully explicit solvent have been used to address solvation effects in computation of pK a of TM complexes in protic environments. 13,14,24 The reader is referred to a review by Luber and co-workers 25 for a comprehensive overview of AIMD-based protocols for computing pK a .
DFT-based methods typically require geometry optimization and calculation of the Hessian matrix to estimate the Gibbs free energy of protonated and deprotonated complexes. Even for relatively small complexes (B50 atoms) with a single TM center, DFT calculations can take several hours to converge. AIMD simulations typically require several days to be able to compute a single pK a value of a TM complex. A model based on additive ligand acidity constants (LACs) was proposed by Morris and co-workers, which avoids DFT calculations and can compute the pK a of TM hydrides. 26 The additive LAC method uses the ligand acidity constants of ligands coordinated to the metal centre, the charge of the conjugate base form of the metal complex, the location of TM metal in the periodic table and a correction related to the stability and geometry of the metal centre. While simple, reasonably accurate and motivated by physical principles, the additive LAC model requires knowledge of acidity constants of coordinating ligands, which makes it difficult to use directly in high throughput screening workflows. Cundari and co-workers recently reported ML-based methods for the estimation of pK a of methane adducts of TM complexes and demonstrated the potential of ML for catalyst design via rapid property prediction. 27 The potential of GFNn-xTB methods towards rapid and accurate prediction of pK a was recently demonstrated in the SAMPL6 challenge by Grimme and co-workers. 28 They demonstrated that the workflows based on GFN1-xTB and GFN2-xTB methods resulted in rapid and accurate prediction of experimental pK a of 24 drug-like molecules. The performance of GFN2-xTB has also been tested upon a large number of TM complexes taken from the Cambridge structural database. 28,29 However, the performance of GFN2-xTB towards prediction of thermochemical properties of TM complexes has not been extensively validated against experimental and/or DFT computed data.
Therefore, research objectives in this study are two-fold: (1) systematically improve the accuracy of the GFN2-xTB method for prediction of experimental pK a of TM hydrides via a dataaugmented approach, and (2) assess the suitability of GFN2-xTB and DFT//GFN2-xTB (i.e. DFT energy refinement on GFN2-xTB optimized geometries) for predicting pK a as compared with the conventional full DFT computational protocol. The presented dataaugmented approach leads to a systematic improvement in the accuracy of the GFN2-xTB method for predicting the experimental pK a of TM hydrides at negligible additional computational cost. As a final test we use our data-augmented approach to predict the ligand pK a of TM complexes and estimate the pK a of TM hydrides for which ambiguous values have been reported in the literature.

Semiempirical tight-binding calculations
Semiempirical tight-binding calculations were carried out using the xTB code. 12,30 We applied the GFN2-xTB method, 31,32 recently developed by the Grimme group. Molecular geometries were subject to geometry optimization using the verytight criteria. The Hessian matrix calculations were performed for all optimized geometries to verify the absence of imaginary frequencies and that each geometry corresponds to a local minimum on its respective potential energy surface (PES). Solvent effects were implicitly accounted for using the GBSA solvation mode 33,34 as implemented in xTB. ‡

Density functional theory calculations
DFT calculations were carried out using the Gaussian 16 C.01 program package. 35 Geometry optimizations were carried out using the PBE0 (also denoted as the PBE1PBE) 36 exchange-correlation  functional, def2-SVP 37 basis set and the Grimme's dispersion corrections (D3 version). 38 The choice of the PBE0 functional is motivated by our previous experience for prediction of reliable results for TM complexes in good agreement with experimental data. Furthermore, our initial test showed the superior performance of the PBE0 method for the prediction of experimentally determined ligand pK a (N-H function) of several representative TM complexes. The DFT computed-energies were additionally refined by single point (SP) calculations using the PBE0-D3 method together with the SMD variation of the IEFPCM implicit solvent model of Truhlar and co-workers 39 and the combination of the LANL08 basis set 40 on the metal center and the cc-pvtz basis set for all other atoms. 41 We note two limitations of our current approach. All complexes were computed at their lowest spin state and higher spin states were not considered in our calculations. Furthermore, we did not make a full exploration of the conformation space of the TM complexes, which could contribute towards the observed pK a . Conformation searching is computationally expensive at the DFT level of theory. At the GFN2-xTB level of theory one could use CREST calculations, which use a metadynamics-based approach to make a robust exploration of various possible conformations of a given complex. [42][43][44] However, such a focused investigation of the contribution of conformational freedom to the pK a of M complexes is outside the scope of the current study. All geometries were pre-optimized first at the GFN2-xTB level and then subject to full DFT-based optimization. DFT-based geometry optimizations were carried out in the gas phase. We also performed geometry optimization in the solvated phase (using the SMD solvation model). The gas phase and solvated optimized geometries were found to be in good agreement with each other (see Fig. S23, ESI †). We however note that highly flexible molecular geometries with multiple conformations can show different global minimum in the gas and solvated phases.

Machine learning methods
We performed DFT-B-based geometry optimizations for 177 complexes in the gas phase and in solution. The output files were analysed to extract 17 descriptors (see Table 2) based on the computed electronic structure, geometry and energetics, which were stored in the dataset along with the experimentally measured pK a values obtained from the literature. For ML modelling we took two approaches: (1) linear regression via the ordinary least squares (OLS) fitting using sklearn library in python. 45 (2) Automated ML using auto-sklearn library in python. 46,47 Auto-sklearn allows a rigorous search of a number of ML regression models and hyperparameters. The Pearson correlation coefficient (r2_score) was used as a metric for the optimization of the ML model. The dataset was split into the training and test sets (80-20 split). The ML model was trained on the training set and its performance was tested on a test set, which it has not seen before. Cross-validation (CV) set via k-fold (5 folds) sampling was used, while optimizing the ML model.
The ML models were further tested for their accuracy on an additional dataset of ligand pK a of TM complexes. ML models have been solely trained on TM hydrides and their performance on ligand pK a demonstrates their general applicability. We further use the ML models to assign pK a values of complexes, for which multiple pK a values have been reported in the literature and compare the results with the DFT-based predictions. The accuracy of ML models is assessed via calculation of root mean squared error (RMSE), wherever labelled data are available and compared against the performance of the DFT-based methods, wherever possible.

Determination of pK a
Several methods have been proposed in the literature for estimating pK a by means of DFT calculations. 15 For an acid-base titration between an acid AH and base B to produce the conjugate base A and conjugate acid BH, one can express the pK a of AH as B is a reference base with a known pK a of its conjugate acid form, BH. The quantities G(BH) À G(B) and pK a (BH) are constants for a given solvent at a fixed temperature. Therefore, for a given solvent and temperature pK a (AH) is a linear function of G AH ð ÞÀG A ð Þ. We define proton affinity (PA) as the difference between the Gibbs free energies of protonated (G(AH)) and deprotonated species (G(A)) i.e.
Equivalently one can also express the pK a (AH) using the PA and solvated Gibbs free energy of a proton.
It can be shown that eqn (1) and (4) are equivalent and pK a can be expressed as a linear function of PA at fixed temperatures and for a given solvent.
For both the DFT and GFN2-xTB calculations, we discovered that the PA computed using Gibbs free energy (eqn (3)) correlates perfectly with the PA computed using electronic energies only (i.e. PA = E(A) À E(AH)). We therefore use electronic energy to compute the PA (EE(A) À E(AH)) in our ML models. For further details please refer to Fig. S3 and S4 in the ESI. † Results and discussion pK a MH dataset A data-augmented approach requires reliable data. Generation of data by experimentation and simulation is one of the key propelling factors towards the rise of data-driven chemical sciences. However, well-curated experimental datasets on TM complexes with measured/computed thermodynamic properties are rare. The Cambridge structural database (CSD) partly serves this purpose by providing geometries and measured/calculated properties of TM complexes but lacks experimentally measured or computed pK a values of TM complexes. In fact, to the best of our knowledge no open datasets on experimentally measured pK a values of TM complexes along with geometric information are available. To address this we curated experimental pK a data for over 200 TM complexes from the literature (Fig. 2).
Most of these complexes are transition metal hydrides where the pK a of the M-H bond has been measured. The dataset is provided with 3D coordinates of the TM complexes (acid and conjugate base form) computed using GFN2-xTB. The dataset, referred to as pK a MH is provided as a .csv file and includes the DOI of original references and review papers that cite the measured pK a . pK a MH consists of 201 TM complexes in 6 different solvents and 14 metal centers (Fig. 2).
In the process of curating the dataset we observed that a uniform experimental method was not always used in determination of the pK a of TM hydride complexes. On many occasions the pK a was indirectly determined e.g. using linear correlations with a reduction potential or via thermodynamic cycles. In some cases where the conjugate base complex was unstable, pK a was determined by indirect methods. 3,[48][49][50][51] The pK a data are therefore also expected to contain errors related to the measurement/estimation method.

GFN2-xTB and DFT calculations
We computed the solvated PA for all complexes using the GBSA implicit solvation method as implemented in xTB. The outlier complexes mainly consisted of complexes with multiple carbonyl (CO) groups (see Fig. S22, ESI †) with the exception of complex 159 ([HFe(Py 2 Tstacn)] +2 ). The pK a of complex 159 was experimentally determined in a solvent mixture of acetonitrile and water but computed in pure acetonitrile. Our calculations suggest that GFN2-xTB may have limited accuracy in describing the M-CO bonds. This aspect is discussed later in the manuscript.
The correlation between the PA and the experimental pK a is rather surprising taking into account that different solvents are involved and the solvation free energy of proton or the PA and pK a of a reference base were not considered (eqn (1) and (4)). We speculate that this is related to a small variation in the solvation free energy of H + across the range of solvents considered. These results indicate that PA is a good descriptor of pK a of TM complexes. 52 Having computed the PA using the GFN2-xTB and DFT// GFN2-xTB methods, we turn to estimating the pK a using a full DFT approach. The DFT computed PA correlates well with the experimental pK a (R 2 = 0.84, and RMSE = 4. To further assess the performance of GFN2-xTB, we compared the accuracy of DFT, DFT//GFN2-xTB and GFN2-xTB for predicting the experimental pK a of TM hydride complexes in our database in acetonitrile solvent. We have chosen acetonitrile solvent for comparison since it has the largest share in the database and it is parametrized both in Gaussian and xTB packages. We identified 69 TM complexes for which both DFT and GFN2-xTB calculations were found to converge without errors. The resulting plot is shown in Fig. 4. Going from GFN2-xTB (R 2 = 0.76; RMSE = 4.3) to DFT// GFN2-xTB (R 2 = 0.51; RMSE = 6.1) leads to a drastic deterioration of the predictive capability for experimental pK a . Full DFT-based Fig. 2 A summary of the pK a MH dataset reported in this work used for the data-augmented prediction of experimental pK a using the GFN2-xTB method. and acid type complexes. The mean and median values of e(A) are À35.2 kcal mol À1 and À30.6 kcal mol À1 , and for e(AH) are À38.2 kcal mol À1 and À32.0 kcal mol À1 , respectively. Complex 45 has a high DPA = À66 kcal mol À1 , with e(A) = À25 kcal mol À1 and e(AH) = À91 kcal mol À1 . Therefore, the conjugate base form of complex 45 can be considered to have an above-average stability, while the acid form has a high error. On the other hand, complex 43, which has a low DPA = À1.4 kcal mol À1 has e(A) = À45.6 kcal mol À1 and e(AH) = À47 kcal mol À1 . Therefore, both the acid and conjugate base forms for complex 43 have high error.
Complex 43 therefore has a lower overall error in PA due to favourable error cancellation on the conjugate acid and base forms. The mean and median of the absolute DPA were found to be 6.2 and 3.3 kcal mol À1 , respectively, with a rather large standard deviation of 10.7 kcal mol À1 indicating an overall good agreement between DFT and GFN2-xTB with some highly skewed cases of large disagreement. The sign of DPA determines whether the acid (AH) or the base (A) form of the complex has a larger error as compared to DFT. DPA o 0 indicates a larger error in the acid form (AH) of the complex, while DPA 4 0 denotes that the conjugate base form (A) contributes to the overall error. Complexes (in acetonitrile) that featured a |DPA| 4 5 kcal mol À1 have been tabulated in Table 1. The majority of complexes have a negative DPA indicating the higher instability of the AH forms of geometries computed by GFN2-xTB as compared to DFT.
A cursory analysis of entries in Table 1 reveals that the complexes with |DPA| 4 5 kcal mol À1 either contain phosphine-based ligands or multiple CO ligands or both. To compare the DFT and GFN2-xTB predicted geometries we made structure overlay plots of the acid Fig. 4 Comparison of DFT-, DFT//GFN2-xTB-and GFN2-xTB (inset)-computed pK a as the estimator of experimental pK a for 69 TM complexes in acetonitrile. Table 1 The TM complexes with a computed |DPA| 4 5 kcal mol À1 and the respective index of the complex in pK a MH, name (conjugate base), DPA (in kcal mol À1 ) as well as the DFT-GFN2-xTB-errors in A and AH forms (in kcal mol À1 ) and conjugate base forms of selected complexes, which are presented in Fig. 5. The Pd-based PNP complex (25) shows a moderate DPA of 8.9 kcal mol À1 . The structure overlay figure (Fig. 5) reveals that the GFN2-xTB optimization results in hemi-labile PNP coordination in the [Pd(PNP) 2 ] complex, which probably contributes to a higher error for the conjugate base form. Contrastingly such hemilabile coordination was not observed for the acid form of the complex [HPd(PNP) 2 ] + . For complex 30, the planarity of the phenathroline ring is misrepresented in the acid form of the complex leading to a negative DPA of À11.4 kcal mol À1 . Similarly, the mismatch in orientation of benzene rings between GFN2-xTB and DFT in the acid form of complex 32 leads to a DPA of À15.9 kcal mol À1 . Interestingly for the dicarbonyl W complex 62 with a PMe 3 ligand, the energy difference between acid and base forms are both relatively low. However, the acid form is more destabilized due to an underestimated C CO -W-C CO angle leading to a DPA of À14.4 kcal mol À1 . In comparison, complex 7 has a DPA of À5.1 kcal mol À1 but energy differences in excess of 60 kcal mol À1 for the acid and base forms ( Table 1 (Table 1). Interestingly the C CO -M-C CO does not have a large deviation in the acid forms of complexes 75 and 78 (also see Fig. S21, ESI †). In contrast, the acid form of complex 62 has an underestimated C CO -M-C CO angle (B231), and complex 102 features largely (4301) underestimated C CO -M-C CO angles in both acid and conjugate base forms. Therefore, the erroneous representation of the C CO -M-C CO angle or M-CO bonding in general is not systematically present in all complexes. The tricarbonyl complexes 45-47 and 56 all feature a very large and negative DPA stemming from highly destabilized acid forms of these complexes (Fig. 6). The structure plots of complexes 45-47 and 56 in their acid forms reveal the inaccurate description of M-CO bonding in GFN2-xTB.
The C CO -M-C CO angles between adjacent CO moieties are largely underestimated, for example by B351 in complex 45 (acid form). It can therefore be inferred that the GFN2-xTB-optimized geometries are less reliable for metal carbonyl complexes. The unsystematic nature of the error makes prediction of pK a unreliable using the GFN2-xTB-optimized geometries. The prediction of PA involves taking an energy difference between the A and AH type conjugate base-acid complexes. This results in scenarios where favourable cancellation of error is possible. For example, complexes 45-47 and 56 have absolute errors (difference between predicted pK a and experimental pK a ) of 30-38 pK a units, respectively, at the DFT//GFN2-xTB level of theory. However, the absolute errors on pK a predicted by the stand-alone GFN2-xTB calculations on the same complexes are between 1.0-3.8 pK a units indicating a favourable error cancellation at the GFN2-xTB level of theory. Therefore, despite inaccurate representation of M-CO bonding the GFN2-xTB method gives consistent results when used as a standalone demonstrating its robust thermochemical predictive power. Moreover, examining the geometric overlays in Fig. 5 the overlap of GFN2-xTB-predicted and DFT-predicted geometries have  a good agreement in general despite significant mismatch for CO ligands. For example, the Cp/Cp* ligands seem to overlap very well between the geometries predicted by two methods. Apart from the poor description of M-CO type complexes a notable challenge for the GFN2-xTB method was identified to be its convergence failure for complexes with multiple hydrides. We found 16 TM complexes for which at least one or both of the base and acid forms did not converge. With the exception of the dinitrogen complex [HCr(N 2 )((P(Ph)) 3 (N(Bn)) 3 )(dmpe)] + (index 97) for which the reason for convergence failure is not understood, all of these complexes have multiple M-H bonds indicating that the GFN2-xTB method faces problems with such systems.

Machine learning experimental pK a using GFN2-xTB
Given the stand-alone performance of the GFN2-xTB methods in predicting experimental pK a , it can be considered robust and a good starting point for thermochemical property calculations. We seek to improve the predictive capability of GFN2-xTB using a dataaugmented approach. Our hypothesis is that GFN2-xTB already provides good geometric and energetic predictions. These predictions when used as features in an ML model, can be used to learn the experimental pK a . We therefore use GFN2-xTB-computed molecular geometries and energetic features to learn the experimental pK a of TM complexes. The choice of features is driven by intuition and physical reasoning in the present work. A more rigorous and automated approach towards construction and identification of relevant features from DFT-B calculations is an ongoing effort in our group.
We selected a set of 17 features, which include the HOMO and LUMO energies of AH and A, DFT-B computed partial charges on metal (AH and A) and hydrogen (which is to be deprotonated), atomic number, coordination number and coordination environments of metal centre in AH and A, dielectric constant of the solvent, solvated and gas phase PA, M-H bond length and total charge on AH complex ( Table 2). Note that the total size of the dataset used for ML (168) is smaller than the dataset, for which experimental pK a values have been curated. For 16 TM complexes DFT-B calculations did not converge (vide supra). We excluded complexes with multiple metal centres from our analysis (7 entries). Moreover, some complexes had ambiguous pK a or pK a values that were later revised in the literature (5 + 3 entries), and 3 entries are actually those of ligand pK a . We applied an ordinary least squares fitting on 80% of the dataset to learn the experimental pK a and use 20% of the dataset for testing the prediction learnt by the model. The results are presented in Fig. 7.
The OLS model leads to a significant improvement in the predicting power of the DFT-B method for the pK a of TM complexes in the database resulting in an R 2 of B0.87 and an RMSE of B4.1 pK a units (Fig. 7). Next, we explored the AutoML method provided by the auto-sklearn library in python. The details of the model are described in the Computational methods section. The AutoML model found that the K nearest neighbour (k-NN) algorithm performed the best on our dataset. The complete ensemble of the learned ML algorithms is presented in the ESI. † The AutoML model resulted in an R 2 = 0.94 and an RMSE = 2.7 for the test set (Fig. 8). The AutoML model therefore outperforms OLS and has similar accuracy to that of pure DFT.
A particularly notable case is the WH(CO) 3 (C 5 H 4 COO À ) complex (index 99 in pK a MH), for which an experimental pK a of 5.8 has been reported in water. 53 The OLS model predicted a pK a of 21.1 for this complex. Consistently a pK a of 18.0 is predicted by the LAC method. 26 The AutoML model predicted a pK a of 17.0 for this complex. This is the only anionic acid in the database, which could be the reason for erroneous predictions by various models. This complex is therefore considered an outlier and it is excluded from training/test sets and is not plotted in Fig. 7 and 8.
We used the DFT, GFN2-xTB, OLS and autoML models to estimate the pK a of complexes with multiple/revised pK a in the literature and ligand pK a . We further added 7 additional complexes, for which ligand pK a were reported in the literature. Note that the ML models are purely trained on the pK a of metal  hydrides and have never encountered ligand pK a . The assignment of ligand pK a tests the generality and transferability of our ML models. Furthermore, these test cases allow us to compare the accuracy of DFT, GFN2-xTB, OLS and AutoML models on an equal footing (Table 3). Ligand pK a for complexes 66, 73 and 74 proved difficult to predict for all the methods. GFN2-xTB performed worse (RMSE = 6.9), followed by OLS (RMSE = 6.0), AutoML (RMSE = 5.8) and DFT (RMSE = 4.8). For the Trop 2 family of complexes, which are not a part of pK a MH, the AutoML model performs well with an RMSE of 4.0, while OLS showed a high RMSE of 7.5 indicating poor transferability of the OLS model. An estimated pK a o À5 was established for complex 124 based on its reactivity with HOTf (aqueous pK a = À5). 58 Using the correlation of the DFT-computed PA with exp. pK a , we estimated a pK a of À0.5 for HOTf. Therefore, the pK a of complex 124 is expected to be oÀ0.5 in contrast to À5 as reported earlier. 58 OLS predicts a highly negative pK a of À13.4. AutoML, DFT (using linear scaling relation) and GFN2-xTB predict similar pK a values of À1.9, À4.6 and À3.8, respectively. DFT calculations using a reference base predicts a more negative value of À9.5. While all values are oÀ0.5, the variation in predictions make it difficult to assign a particular value to the pK a of complex 124.
For complexes 136, 137, 139 and 155 pK a values have not been measured in the literature, but rather an acidity scale was set up in CD 2 Cl 2 . 57 Both the DFT and GFN2-xTB methods consistently predict higher pK a values for these complexes in contrast to OLS and AutoML, which predict smaller values. If we consider the relative acidities as per the acidity scale, pK a should follow 137 o 136 o 139 o 155. Only AutoML and GFN2-xTB predicted pK a to follow this trend. For complexes 75, 76 and 78 the literature values were erroneously reported earlier and were corrected in subsequent studies. 59 All four approaches work well with low RMSE values in predicting the pK a of complexes 75, 76 and 78.
Experimentally measured pK a values typically have an error in the order of 1 pK a units. To the best of our knowledge, a comprehensive benchmark for the performance of computational methods towards prediction of experimental pK a of TM complexes does not exist. AIMD simulations have been applied to compute the aqueous phase ligand pK a of TM complexes and are reported to have an error of 1-2 pK a units. 13 Qi and coworkers performed CCSD(T) and DFT calculations using the

Summary and conclusions
In this manuscript we identify and address some of the key challenges for accurate and rapid prediction of thermochemical properties of TM complexes using quantum chemical approaches. We applied and compared two quantum chemical methods: semiempirical GFN2-xTB and hybrid DFT. Using pK a as a model thermochemical problem we first curated a novel dataset pK a MH composed of pK a of B200 TM hydride complexes. Our calculations revealed that PA is a good descriptor of experimental pK a . We further discovered that the computationally expensive Hessian calculations can be avoided when using PA to estimate experimental pK a values. Comparison of DFT and DFT//GFN2-xTB calculations revealed that while GFN2-xTBpredicted geometries are close to DFT-predicted geometries, significant errors can occur in the case of metal carbonyl complexes due to inaccurate representation of chemical bonding of M-CO functions. We further found out that despite such inaccurate geometric representations the GFN2-xTB method is robust for thermochemical property predictions when used as a standalone. However, direct use of GFN2-xTB-optimized geometries for DFT-based single-point calculations is not recommended due to the unsystematic nature of errors posed by the GFN2-xTB-optimized geometries. The GFN2-xTB method faced convergence issues for multi-hydride TM complexes. Using a data-augmented approach we computed features from GFN2-xTB and trained two different ML models to learn experimental pK a values. The OLS method resulted in a reasonable accuracy (R 2 = 0.87, and RMSE = 4.1), which is comparable albeit inferior to DFT-based predictions. The autoML approach using auto-sklearn library improved the performance of the GFN2-xTB approach to near DFT accuracy with an R 2 of 0.94 and an RMSE of 2.7 on the test set. We further tested the ML models to predict the pK a of TM complexes, which underwent deprotonation at the ligands. Even though the ML models were trained on TM-hydrides the AutoML model performed reasonably well for predicting ligand pK a values showing its transferability.
Our calculations identify challenging cases for predicting geometry and thermochemical properties of TM complexes using GFN2-xTB methods. We further demonstrate the promise Table 3 Experimental and predicted pK a for ligand pK a of TM complexes, and TM complexes with multiple/revised pK a reported in the literature. Estimate of pK a values based on DFT calculations are also given. The DFT-based pK a was estimated using the equation for linear correlation of PA estimated using DFT vs. exp. pK a (0.6388x À 171.41 (MeCN); 0.4974x À 136.09 (CH 2 Cl 2 ); 0.4903x À 132.69; x is PA in kcal mol À1 ). Values in parentheses denote estimated pK a values via a reference base using eqn (1). GFN2-xTB-based pK a were estimated using the linear correlation of PA with exp. pK a (0.3385x À 29.405 (CH 2 Cl 2 ); 0.3867x À 30.71 (MeCN); 0.9972x À 116.05 (THF))