Ajnabiul Hoquea,
Taiwei Changb,
Jin-Quan Yu
*b and
Raghavan B. Sunoj
*ac
aDepartment of Chemistry, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India. E-mail: sunoj@chem.iitb.ac.in
bDepartment of Chemistry, The Scripps Research Institute, La Jolla, California 92037, USA. E-mail: yu200@scripps.edu
cCentre for Machine Intelligence and Data Science, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
First published on 10th June 2025
Molecular machine learning (ML) has gained considerable attention in recent years. Developing ML algorithms for chemical reaction prediction is a formidable task, due to the small-sized reaction data it often presents, besides the sparsity and skewed distribution. While previous ML studies offered effective predictions on known reactions, efforts in using deep generative models for guiding new reactions and their prospective validation are rare. We harness both predictive and explorative abilities of deep learning on an important catalytic asymmetric β-C(sp3)–H activation reaction, consisting of 220 experimentally reported examples that differs primarily in terms of the substrate, catalyst, and coupling partner. A transfer learning approach using a chemical language model, pretrained on 1 million unlabeled molecules followed by fine-tuning on this reaction data set, is adopted. Our ensemble prediction (EnP) model, where 30 fine-tuned CLMs concurrently predict the %ee of test set reactions, is highly reliable. Another language model, fine-tuned on the 77 known chiral ligands as used in the above reactions, is employed for generating novel ligands of high validity and novelty. A proof of concept wet-lab experimental validation reveals that most of the ML-generated reactions are in excellent agreement with the EnP predictions. Results also caution the prospects of ML-driven reaction development for ligand design and emphasize the importance of domain experts in key decisions.
A good number of molecular ML models for chemical reactions have already become available, many of them offering impressive performances.18,19 These studies tend to recommend their best-trained model for different scenarios found in reaction outcome prediction tasks.20–22 Predictions of yield, selectivity, prospective target identification in reactions, etc., have become feasible and affordable. However, relying on one fully trained ML model might limit model generalizability when predicting on unseen reactions. It is worth reckoning that the idea of weak and strong learners is effectively incorporated in ML models such as the random forest (RF). The very use of several decision trees in RF models, or even an extended version such as ensemble RF, have been in use.23,24 These methods provide multiple predicted values for every sample in the test set. In a conceptually different approach, such as in DL, a fully trained model is generally used for predicting on unseen samples. We envisaged using multiple independent DL models built on different training sets. We denote this as ensemble prediction model, EnP (vide infra).25,26 The proposed ensemble prediction method assumes additional significance in this work as it comprises of a generative ML task.
It is timely that the capabilities of ML are put to immediate use for emerging classes of reactions. For example, the unprecedented popularity of catalytic C–H bond activation reactions makes them an ideal research problem for examining the efficacy of ML.27–29 A broad array of applications of C–H bond activation reactions in obtaining high-value target compounds, such as drugs and biologically active compounds, are known (Fig. 1, panel-a).30–33 It should be acknowledged that the real experimental data accrued over decades of work in this domain are sparse and imbalanced.34,35 Such data sets are likely to have more samples (reactions) in the low or high enantiomeric excess/yield regime, giving rise to class imbalance.36 Similarly, a lot of instances may form clusters around some of the most frequently used reactants or catalysts, rendering an overall sparse distribution in the data set.37 Besides the complex and non-linear relationship between the labels (i.e., %ee) and feature space representing the samples, these distribution characteristics are likely to make the development of ML models a challenging task. While DL models generally require a large pool of data, generating them could be resource-intensive and time-consuming. In this context, the use of transfer learning (TL) could become an effective approach that transfers knowledge from related tasks to a data-scarce target task, applicable both in generative and predictive settings.38
One of the vital questions at this juncture is to ask whether or not the prospects of DL-based methods could be effectively scrutinized. In other words, would DL models for reaction outcome prediction hold good, when subjected to prospective wet-lab validation? Addressing such questions within the context of a small data regime assumes high significance as it represents the real-world situations found in reaction development. Should the DL-driven reaction development become viable, the model should be able to learn from limited, sparse, and imbalanced data. These very aspects constitute the major objectives of this work.
Given our continued research efforts in C–H bond activation reactions39–43 as well as the need for a timely evaluation of DL-driven reaction development, we became interested in (i) developing robust DL models for enantioselectivity predictions in a catalytic asymmetric β-C(sp3)–H bond activation reaction, (ii) demonstrating the capabilities of generative-AI in discovering novel reactions within this class, (iii) subjecting them to prospective experimental validation, and (iv) analyzing the prospects of our approach in a self-critical manner to evaluate the role of domain experts in such endeavors.
In view of the above-mentioned expectations on the DL model and the nature of the data set, we have employed the ULMFiT-based chemical language model (CLM) in this work. We trained an RNN-based ULMFiT language model48 to learn the molecular representations from the SMILES (simplified molecular input line entry system) input of reactions given in the form of concatenated SMILES of individual reactants.49 During training, the model learns to predict the probability distribution of the next character from a given sequence of strings, similar to that in natural language processing. SMILES strings encode atomic connectivity as well as atom and bond types, thus offering a comprehensive representation of all participating molecules in the reaction. Since the DL models, such as the ULMFiT, require large data for effective learning, first, we pretrained the model using a large library of unlabeled molecules drawn from the ChEMBL database (Fig. 1b).
It is important to note that the pretraining of the language model in this work is utilized for two major downstream tasks. As in a TL setting, the pretrained weights and biases are first used for fine-tuning the target task reaction data set to predict the %ee as the output of our regression model, which is termed as EnP model.49–51 In a separate task, we fine-tuned a target task data set consisting of 77 chiral (amino acid) ligands as used in the asymmetric β-C(sp3)–H bond activation reaction earlier. The idea here is to exploit the model for subsequent generative tasks (vide infra) (Fig. 1c). This fine-tuned generator is denoted as FnG.52 Here, we endeavor to generate novel chiral ligands suitable for this class of reaction, predict their efficacy using the EnP, and subject them to prospective wet-lab experimental validation.
The quality of predictions can also be gleaned both from theparity plot and the pie chart provided in Fig. 2c and d. It can be noted that about 94% of predictions across all 30 runs remain within 10 units of the previously reported experimental %ee values. Similarly, the parity plot conveys a very good correlation between the %ee predicted by the EnP model and the corresponding experimental values a coefficient of determination (R2) of 0.89. More importantly, our EnP model could perform better than the other regression models such as RF, deep neural networks (DNN), and AttentiveFP.57 All these are good indicators of the effectiveness and reliability of our model in the %ee prediction task.58 A more complex transformer-based architecture, such as the T5Chem, could offer a test RMSE of 9.95±1.81, which is inferior to our EnP regressor with an RMSE of 7.57±1.31.59 This is particularly interesting, given that T5Chem is pre-trained on ∼97 million molecules and fine-tuned for %ee prediction tasks. Comprehensive details of model architecture, hyperparameter selection, and validation procedures for all these baseline models are well documented in Section 10 of the ESI.†
After having developed a TL-based EnP model, we became interested in probing the key characteristics of what the model could learn from the input data. To facilitate visualization of the complex and high dimensional encoding vector, we have used the UMAP (uniform manifold approximation and projection) plots that project it onto a reduced space (see Section 5 in ESI† for more details of the UMAP plots).60,61 Seven distinct clusters (labeledas 0 to 6) with high Silhouette scores are discernible in Fig. 3. Interestingly, most of these clusters could be generally characterized as belonging to different substrates and chiral ligands. For instance, clusters 0, 1, 2, and 4, respectively, represent reactions involving chiral ligands LAPAO, LMPAAM, LMPAHA, and LMPAA. Clusters 1 and 3, although they share the same ligand (LMPAAM), the substrates involved are found to be different (cyclopropanes and cyclobutanes). Similarly, clusters 0 and 6 contain different coupling partners/solvents, while the chiral ligand belongs to the LAPAO family. Identification of these chemically meaningful clusters from the latent space of the encoding vector engenders considerable confidence in our model as being able to learn from the given representation of chemical reaction. We plan to utilize this knowledge acquired by the DL model in our generative tasks wherein we sample new chiral ligands from disparate regions of the latent space (vide infra). This would ensure sufficient diversity among the generated reactions employing such chiral ligands.
We were pleased to note that the FnG could generate chiral (amino acid) ligands with a high validity of 99% besides excellent uniqueness (98%) and novelty (98%). While these ligands are all valid molecules, our main goal is to expand the ligand library in such a way that they become useful, to the extent possible, when deployed in real-world wet lab validation. We have therefore considered certain chemically relevant criteria to choose from the pool of 490 generated molecules. These filters mandate the presence of (i) at least one, but not more than two, chiral center(s), (ii) N and O-donor sites for its binding to transition metals to enhance their likelihood of being a catalyst in our reaction, and (iii) the NH(CO) moiety near the N donor to facilitate β-C(sp3)–H bond activation.62,63 With these mechanistically informed filters in place, we could identify 73 chiral amino acid ligands from among the 490 candidates generated by the model. We consider it important to sample novel chiral ligands from the neighborhood of the experimentally known ligands. Such ligands are more likely to follow a similar mechanism to those of the known reactions, thereby acting as an implicit safeguard toward rendering the predictions more realistic.
We were further pleased to learn that our FnG model offered better performance than the other SOTA models deployed in molecular generation, including genetic algorithm,64 graph-based generative models,65 and virtual screening (VS) for the generation/filtering of chiral ligands. The TL-based ULMFiT method emerged as the top-performing model, as indicated by the percentage of novel and practically useful molecules generated, besides their Fréchet ChemNet Distance (FCD).66 A detailed comparison between these generative models is provided in Section 11 of the ESI.† Specifically, our FnG model could achieve a significantly lower FCD score of 4.1 compared to VS, whose FCD is as high as 32.3. This highlights the superior ability of our model in generating chemically similar ligands to those in the training set, implicitly offering higher similarity in their catalytic mechanism. The success of this approach also indicates the potential of a complementary role generative models can play in catalyst discovery. By integrating domain expertise into the training data selection (in the present case, a smaller sizedthe target data set is manually curated) and the use of TL to guide the generation of chiral ligands (as illustrated in Fig. 4), the FnG model effectively reduces the search space for identifying promising catalysts. It is important to consider generative models as tools that augment, rather than replace, domain knowledge. This TL-based model serves as a potent platform for combining human expertise with data-driven techniques, allowing for efficient navigation of realistic chemical spaces and addressing the limitations of small data sets.
![]() | ||
Fig. 4 Representative examples of the generated chiral ligands, along with their closest experimentally known analogues, are shown along the periphery of the square box. The training and generated sets are respectively shown in orange blue dots to convey their structural similarities. The fingerprints in TMAP (Tree MAP) undergo min hashing with a weighted scheme to ensure compatibility with the LSH (locality-sensitive hashing) forest (see Section 8 in the ESI† for more details). |
The TMAP plot, as given in Fig. 4, helps in assessing the similarities between the generated chiral amino acid ligands and those in the training set (experimentally known ligands).67 The spread of the orange and blue dots in the plot and the proximity between them indicate a couple of chemically interesting aspects. The generated chiral ligands span sufficiently wider regions of the chemical space while maintaining structural similarity to the training examples, both suggestive of efficient exploration of the latent space of the DL model. It is interesting to note that one of the generated APAO ligands, shown in the upper right region, is somewhat similar to the corresponding training set analogue. However, the critical substituent at the chiral center in the generated ligand is a tert-butyl as opposed to an isopropyl group in the training set. Similarly, one of the generated MPAHA ligands shown in the lower left side bears a –CF3 group in place of –H in the training set. These changes noticed in the generative task, as explored by the model, are indeed chemically reasonable.
Since quite a few new chiral ligands are generated, a comparison of each of them with those in the training set might not be desirable. Hence, a widely used metric such as the Tanimoto coefficient is employed for quantitative comparisons68 (see Section 8 in the ESI†). A mean similarity score of 0.52 between the generated and training set chiral amino acid ligands indicates a close structural resemblance. At the same time, the diversity of 0.42 among the generated ligands suggests their reasonable spread in the chemical space. The plots shown in Fig. 5 convey similarity, diversity, and a few other desirable molecular properties relevant to catalysis. Physicochemical properties such as the number of H-bond donors, molecular volume, and steric characteristics around the donor sites can influence the nature of binding of chiral ligands to the transition metal.69 It can be discerned from Fig. 5 that most of these properties of the generated set remain similar to those in the training set. We consider these aspects an advantage of our FnG in keeping the novel design space within chemically manageable regions.
A set of additional but more valuable metrics for evaluation of the likely utility of the generated chiral ligands is to consider synthetic accessibility score (SAS) and synthetic complexity score (SCS).70,71 The SAS, ranging from 1 (easy) to 10 (difficult), considers molecular size, substructures, and complexity, while SCS (varies from 1 to 5) measures structural complexity, including functional groups, ring systems, and stereocenters. Given that a lower SA score implies the ease of synthesis, one could employ these to choose the right candidate for prospective experimental validation (vide infra). It can be gathered from panels d and e in Fig. 5 that the mean SA(2.78) and SC(2.69) of the generated chiral ligands are comparable tothat of the corresponding values of the training set, which are respectively 2.64 and 2.80. While these values provide an initial confidence in the synthetic feasibility of the generated ligands, the use case scenarios might become tricky as exemplified in the later section where we discuss our experimental validation efforts.
With the ML-generated chiral amino acid ligands with us, we became interested in evaluating their efficacies in the asymmetric β-C(sp3)–H bond activation reaction. We use our EnP regression model to afford good quality %ee predictions for every new reaction due to the use of these novel ligands. It is known from our training set with 220 experimentally known reactions that the %ee depends predominantly on the nature of the chiral ligand used in conjunction with a transition metal. Each reaction in the training set involves a catalyst consisting of a chiral ligand bound to a transition metal (Fig. 1b). There are 77 unique chiral ligands in the training set giving rise to a total of 220 known reactions. Any one of these chiral ligands could be replaced with the generated ligand while keeping the other reaction components, such as the cycloalkane and coupling partner, the same. Through such replacements, we get 9855 possible reactions (73 new chiral ligands multiplied by 135 known combinations of reaction partners other than the chiral ligands) and their predicted %ee, facilitating quick identification of promising ligands (or even the choice of substrates/coupling partner) from among the generated set (see Section 9 in ESI† for more details).
It is important to reckon that the DL model has learned chemically significant characteristics from the training set (Fig. 4) and has also been able to generate new chiral ligands for the reaction of interest. Motivated by the fact that our DL model effectively learns from the reaction encoding (Fig. 3), we desired to make ML-based recommendations for new reactions. This can be done either by directly predicting the %ee for any newly generated chiral ligand or by choosing them from the t-SNE projections of the latent vectors of the reactions due to such ligands. For instance, a prospective higher %ee reaction can be located from Fig. 6a/b or from the heat map in Fig. 6c. Alternatively, a simpler and approximate measure of the expected outcome, higher or lower than the mean %ee of 67, can also be gathered from the t-SNE plots in Fig. 6b.72 The region of reactions above the mean %ee can be readily identified between 20 and -40 along the t-SNE2 axis. Similarly, one can also choose several high %ee reactions from the central region enclosed between (20,-20) in t-SNE2 and (-40,60) in t-SNE1 of Fig. 6a.73
The heat map depiction of the predicted %ee of new reactions (Fig. 6c) can be analyzed in different ways. For example, it can help make an informed choice as to which among the new reactions would be of greater interest. Each color pixel in this plot represents the predicted %ee for one of the generated chiral ligands in a reaction, and each row conveys the performance of the same ligand across all 135 unique combinations of reacting partners. Some of the pixels are shown expanded to the right to explicitly display a set of representative reactions belonging to both high and low %ee ranges. It is important to recollect that the training data is skewed toward the higher %ee, leaving very few training samples in the lower %ee region. Hence, it is important to consider the generated reactions even from the lower end of the output for wet-lab validations. This would serve as a good demonstration of the robustness of our model across all likely outcomes.
entry | substratea | coupling partner | ligand | reaction conditionb | pred.c %ee | exp. %ee |
---|---|---|---|---|---|---|
a Ar1 = –p-CF3C6F4; Ar2 = –p-CNC6F4.b RC1 = Pd(MeCN)2Cl2 (10 mol%), ligand (10 mol%), Ag2CO3 (2.0 equiv.), base (2.0 equiv.), CHCl3 (1.0 mL), 80 °C, and 24 h. RC2 = Pd(OAc)2 (10 mol%), ligand (11 mol%), Ag2CO3 (1.5 equiv.), Na2CO3 (2.0 equiv.), BQ (0.5 equiv.), H2O (5.0 equiv.), t-AmylOH (0.5 mL), 70 °C, and 24 h. RC3 = Pd(OAc)2 (10 mol%), ligand (20 mol%), Ag2CO3 (1.5 equiv.), K2HPO4 (1.5 equiv.), t-BuOH (1.0 mL), H2O (10.0 equiv.), BQ (0.5 equiv.), 80 °C, and 12 h. RC4 = Pd(OAc)2 (10 mol%), ligand (20 mol%), Ag2CO3 (1.5 equiv.), Na2CO3 (2.0 equiv.), HFIP (0.25 mL), 80 °C, and 16 h.c The standard deviation in the predicted values stems from the use of our EnP regressor, where each reaction is predicted by multiple regressors (see Fig. 2a). | ||||||
1 | ![]() |
![]() |
L5 | RC3 | 94±3 | 94 |
2 | ![]() |
![]() |
L5 | RC3 | 94±3 | 94 |
3 | ![]() |
![]() |
L5 | RC4 | 93±3 | 86 |
4 | ![]() |
![]() |
L5 | RC4 | 93±3 | 85 |
5 | ![]() |
![]() |
L4 | RC3 | 90±3 | 86 |
6 | ![]() |
![]() |
L4 | RC3 | 90±3 | 85 |
7 | ![]() |
![]() |
L4 | RC3 | 90±3 | 86 |
8 | ![]() |
![]() |
L6 | RC1 (Li3PO4) | 85±2 | 80 |
9 | ![]() |
![]() |
L6 | RC1 (Na3PO4) | 84±3 | 81 |
10 | ![]() |
![]() |
L7 | RC2 (Pd(OPiv)2) | 46±6 | 30 |
11 | ![]() |
![]() |
L8 | RC2 (Pd(OAc)2) | 35±6 | 23 |
12 | ![]() |
![]() |
L(Ac-Phe-OH) | RC3 | 90±4 | 90 |
13 | ![]() |
![]() |
L(Ac-Phe-OH) | RC3 | 90±3 | 91 |
14 | ![]() |
![]() |
L(Ac-Phe-OH) | RC3 | 87±3 | 87 |
15 | ![]() |
![]() |
L(Ac-Phe-OH) | RC3 | 86±3 | 89 |
It can be seen from Table 1 that the agreement between the ML-predicted %ee for the generated reactions and those obtained from our wet-lab experiments is very good. Predictions on both cycloalkane carboxylic acids (entries 1 through 7) and N-aryl amides (entries 8 through 11) can be regarded as excellent on pragmatic grounds, as most entries are well within 10 units of the actual values obtained in our prospective validation. Gratifyingly, even the generated reactions in the lower %ee (entries 10 and 11) are in good agreement with the experimental values. A higher standard deviation in the predicted values for these reactions is an indication of the differences in the predictions across different regressors used in our EnP model, built on fewer training samples in the low %ee regime.
In addition to proposing new reactions given by the generated chiral ligands, we have also evaluated the quality of our EnP regressor on another smaller set of unseen reactions. Here, we have conducted new wet-lab experiments by using L(Ac-Phe-OH) as the chiral ligand reported earlier,67 but with different substrate and/or coupling partners, which were not previously used with this ligand. We denote these as ‘complementary reactions’ (or unexplored reactions) as their products could have been obtained by employing the known chiral ligands and a suitable choice of substrate and coupling partner. The experimentally obtained %ee for this family of reactions from the complementary space, shown in entries 12 through 15, also display excellent match with those predicted by the EnP regressor. Moreover, the good correlation between the predicted and experimental %ee for all these 15 out-of-bag reactions is evidenced by the low RMSE of 6.42 and a high R2 of 0.93 shown in Fig. 7. In contrast, DNN, RF, AttentiveFP, and T5Chem exhibited lower R2 values (0.90, 0.84, 0.88, and 0.83, respectively) and higher RMSEs, reflecting their reduced reliability in capturing experimental trends (see Table S24 in the ESI† for more details). These results highlight the robustness and potential of our TL-based ensemble approach as a reliable tool for enantioselectivity prediction in asymmetric catalysis.
Now, we shift focus from our successful validation experiments to another set of generated reactions, as shown in Fig. 8. We wish to convey that a guarded optimism would perhaps be more meaningful when it comes to the prospects of ML-driven experiments to deliver and that the involvement of domain experts in critical decisions would remain all the more important. First, consider one among the generated ligands L5 (MPAAM class) bearing a tertiary -N(Me)(Et) group, with a predicted %ee of 94±3 (see entry 1 in Table 1) and an experimental value of 94 obtained in this study. However, another related ligand L1, with a –(CO)NH(OMe) group, shown as category (iii) in Fig. 8, failed to yield any product under the chosen reaction condition. This alludes to certain interesting points to consider while granting a forward pass for wet lab validation. As described earlier (also see Section 9 in ESI†), upon generation of a new chiral ligand, it is suitably combined with other species such as the transition metal (leading to a chiral transition metal catalyst), substrate, coupling partner as well as entities that contribute to the reaction (solvent, base, and additive), before passing them through the regressors for %ee predictions. In the case of L5, the ML-based regressor identified RC3 or RC4 as a better reaction condition with a predicted %ee of 94±3/93±3. Note that the efficacy of each of the generated chiral ligands is evaluated across all the other participating species (transition metal precursor, substrate, coupling partner, additive, base, etc.). Our wet-lab efforts with L1, a structural analog of L5 (differing in the presence of –(C
O)NH(OMe) in place of –CH2N(Et)Me on one donor site), however, did not give any product, serving as a clue for re-thinking on a rather liberal forward pass as adopted here. Interestingly, the chiral ligands bearing –(C
O)NH(OMe) moiety, although were found to be effective in other reactions with different substrates and reaction conditions,44 it failed to produce the product. It is possible that the ligands (L1–L3) itself are not suitable for catalyst formation under the chosen reaction condition or that the specific substrate used does not interact optimally with the catalytic pocket provided by the chiral ligand.
Another alternative to gather additional credence to the generative tasks is to evaluate the generated ligands for their synthetic feasibilities using metrics such as SAS and SCS.63,64 Values of SAS ≤ 6 and SCS ≤ 3 are generally considered good for amenable laboratory synthesis. The generated ligands (first-row in Fig. 8) have SAS in the range of 2.8 to 3.2 and SCS in 3.0 to 3.2 window. Despite these low scores, synthesis of some of the ligands was found to be not quite feasible owing to the unavailability of (or unstable) precursors. Other examples of hydroxamic acid ligands derived from Boc-O-benzyl-L-serine as well as Boc-O-methyl-L-serine resulted in decomposition during our attempt to synthesize them, despite their low SAS and SCS values (second-row Fig. 8). These might be due to Boc decomposition under the reaction condition as employed. On the contrary, some MPAHA (L1–L3) ligands could be synthesized within about six steps despite their relatively higher SCS (3.4-3.6) (see Section 3.1 in ESI†). More importantly, their low SAS did not align with the experimental observations. So far as those ligands that could be synthesized and did serve as effective catalysts in this reaction, such as the three MPAA ligands (L4, L7, and L8), possesseda low SAS (2.0-2.2) and SCS (1.8-2.7). For ligands such as L5 and L6, respectively requiring 6 and 3 steps from the commercially available precursors, the SCS (4.4 and 3.1) indeed captured the difficulty level in their synthesis. We believe that the difference in SAS/SCS depends on the molecules belonging to a given family of compounds. The SCS appears to be a slightly more reliable indicator of the ease of synthesis, at least in the present case of chiral mono-protected amino acid ligands.
All the above-mentioned factors highlight the importance of expert opinion in the initial selection from among the generated chiral ligands prior to prospective experimental validation. Domain experts can help identify and exclude ligands, despite promising ML predictions, that are synthetically challenging or unlikely to succeed under experimental conditions. For example, ligands with unstable functional groups or borderline SC scores could be discarded. Furthermore, a domain expert could as well consider the latent structural features of the successful ligands from the available known experiments to make an informed decision. This might involve an assessment of favorable steric and/or electronic characteristics and reactive moieties as suitable for the reaction conditions, to guide the selection of promising candidates.
The use of another CLM (FnG), fine-tuned on 77 known chiral ligands, is found to be effective in generating novel chiral ligands for catalytic β-C(sp3)–H bond activation reactions. Exploration of the latent space of this FnG helped us identify several interesting and realistic chiral ligands in the neighborhood of the training samples. Several such generated ligands have good synthetic accessibility as well as synthetic complexity scores, in addition to exhibiting high novelty and uniqueness. Motivated by these, we have subjected a handful of the generated reactions to prospective wet-lab validation. The %ee predicted by our EnP model for the new reactions employing these ML-generated chiral ligands are found to be in very good agreement with the actual experimental values. The predictions on such unseen reactions, for both the high and low %ee regions, are in concert with the ground truths, thus engendering our EnP model with high practical utility. Given that in the reaction development phase, the number of reactions as well as the corresponding %ees is often low, an ML intervention might be beneficial.
While most of our prospective validation experiments are highly successful, conveying that ML could guide reaction development, the results also point to the importance of domain experts in the loop. A few of the ML-generated ligands considered for experimental validation, despite their promising synthetic accessibility score and high predicted %ee, are found to be not viable due to a lack of readily available precursors and/or product formation. As with any predictive science, we expect that one might experience co-lateral issues beyond the scope of current ML implementations, when predictions are subjected to wet-lab validation.
Footnote |
† Electronic supplementary information (ESI) available: Details of reaction data sets, generative and predictive model setups with their hyper-parameter tuning, wet-lab experiments, and full characterization of new compounds including 1H and 13C NMR spectra and HRMS data. See DOI: https://doi.org/10.1039/d5sc01098e |
This journal is © The Royal Society of Chemistry 2025 |