Taline Kerackian*ab,
Clément Wespiserb,
Matthieu Danielb,
Eric Pasquinet
b and
Eugénie Romero
*a
aDépartement Médicaments et Technologies pour la Santé (DMTS), SCBM, Université Paris Saclay, CEA, INRAE, 91191 Gif-sur-Yvette, France. E-mail: eugenie.romero@cea.fr
bCEA, DAM, Le Ripault, F-37260 Monts, France
First published on 24th June 2025
Access to the nitro functional group is a widespread and longstanding transformation of interest in many fields of chemistry. However, the robustness and specificity of this transformation can remain challenging, particularly in the case of heteroarene nitration. Based on this observation, a comprehensive investigation was initiated to screen nitration conditions on various arenes and heteroarenes. A systematic and diverse study of both nitrating agents and activating reagents was conducted using high-throughput experimentation to afford high-quantity and high-quality data generation. General trends were identified and correlated with the electronic properties of the heteroarenes; notably, the difficult nitration of electron-poor heteroarenes was highlighted. Original combinations of reagents were found to perform well in nitration reactions. The obtained data were also used to design a predictive tool relying on machine learning in order to provide the best nitration reaction conditions depending on the targeted substrate. The limited predictive efficiency obtained pointed out the importance of diversification and chemically relevant encoding of the data set.
Together, these advantages support the generation of qualitative experimental data, which serves as excellent input for machine learning processes, in contrast to classical bench-scale data typically described in the literature.34–36 This observation was recently emphasized, as predictive tools developed using standard experimental data from the literature have shown limited efficiency, largely due to a lack of data standardization and the absence of negative results, issues that HTE can address. Machine learning and artificial intelligence have been applied in chemistry, notably to generate predictive tools.37 Most reported HTE campaigns study well-known reactions and generally high-yielding transformations.38–45 During our exploration of heteroarene nitration, we initially observed the limited amount of available literature and also faced significant challenges reproducing reported reaction conditions. Hence, we chose to study the challenging nitration reaction, which is especially low-yielding on electron-poor, nitrogen-containing heteroarenes. Our goal was to test combinations of various substrates and reagents in order to identify optimal reaction conditions and to uncover reactivity trends depending on selected scaffolds.
Based on the existing literature,3,23 we designed a 96-well HTE plate to test 12 different nitrating agents and 8 different activating reagents. This plate was systematically evaluated on: (i) arenes, (ii) electron-rich heteroarenes, and (iii) electron-poor heteroarenes. Each class of substrate was evaluated under three reactivity modes: (i) direct C–H functionalization of the non-functionalized substrate, (ii) ipso-functionalization from the corresponding carboxylic acid, and (iii) ipso-functionalization from the corresponding halogenated substrate. The overall HTE campaign led to the performance of 864 different reactions. The data collected will be used to develop a predictive model based on machine learning. Different types of molecular encoding will be tested, and the ability of the model to accurately predict nitration outcomes on new substrates will be evaluated.
Then, ipso-functionalization was evaluated using 1-naphthoic acid, picolinic acid, benzofuran-2-carboxylic acid, 1-bromonaphthalene, 2-bromopyridine, and 2-bromobenzofuran. Each of these nine different scaffolds was submitted to a 96-well plate designed to study nitration reaction parameters. Most nitration reactions can be regarded as involving a combination of a nitrating agent and an activating reagent. Our goal was to test various combinations of these two species (Fig. 1b). Such variation, based on literature precedent, would allow us to reproduce reported conditions but also permit original combinations of reagents giving the opportunity for a fortunate discovery.
Since the variety of reported nitrating agents is tremendous, we chose to select it as the parameter with the highest number of screened candidates (Fig. 1c). Twelve nitrating agents were selected. Nitric acid and tert-butyl nitrite, two of the main nitration reagents, were picked. Since nitronium tetrafluoroborate provides a solid and stable source of the reactive nitronium ion, many modern methodologies employ it as a nitrating agent.46,47 We naturally picked it as a nitrating agent of interest to screen. Then, both nitrate and nitrite alkali metal salts were selected with sodium as the cationic species. To screen a different cationic counterpart, potassium nitrite was also picked. A soluble nitrate salt, tetrabutylammonium nitrate, was selected. Bismuth(III), silver(I), and iron(III) were chosen, as they are the most commonly used metallic nitrate salts. It has to be noted that both bismuth and iron nitrate salts come as hydrated metal complexes: bismuth(III) nitrate pentahydrate and iron(III) nitrate nonahydrate. Finally, the two most reported N-nitro compounds were selected, namely N-nitrosuccinimide (Succ-NO2) and N-nitrosaccharin (Sacc-NO2).19,48–50 The number of equivalents of nitrating agent was set at two equivalents across the plate.
On the other hand, seven different activating reagents were selected (Fig. 1c). In addition, one line was set to be free from any activating reagent, allowing evaluation of nitrating agents on their own. Persulfates, generally activated thermally, are the most common activating reagents used in nitration reactions. They are readily available radical precursors; here, potassium persulfate was selected. Another common radical precursor is 2,2′-azobis(2-methylpropionitrile) (AIBN). Notably, it was used in catalytic amounts with nitric acid as a nitrating agent to perform nitration under mild conditions.51 Silver species have been used in nitration reactions involving a carboxylic acid species to assist decarboxylation.47,52 Following the same activation pathway, a Lewis acid magnesium salt (magnesium perchlorate hydrate) was also picked. Indeed, Lewis acid species were reported to suit nitration reactions.53 Consequently, two different copper sources were selected: copper(II) trifluoromethanesulfonate54 and copper(I) iodide. N,N′-Dimethylethylenediamine (DMEDA) was also added as a ligand with copper(I) iodide.55 Tris(dibenzylideneacetone)dipalladium(0) together with tBuBrettPhos was screened as a potent catalyst for nitration reactions.56,57 The number of equivalents of activating reagents followed the closest paper of reference.
Acetonitrile was chosen as the most commonly used solvent in nitration reactions. An average concentration of 0.1 M was selected at a reaction scale of 10 μmol, and the reaction was run at 100 °C for 24 hours under an air atmosphere. Each plate was prepared and worked up after reaction by addition of an internal standard, and the crude reactions were analyzed by UHPLC-UV-MS (see details in the ESI†). The general HTE workflow applied in this study, including home-made software for design and visualization, was assessed in a previously reported publication and is detailed in the ESI.†58
Results are presented in Fig. 2 and detailed for the nitration of 1-naphthoic acid in Fig. 2a Quantification of product formation was done by calculating the ratio between the nitration product Area Under Peak (AUP) and the internal standard (biphenyl) Area Under Peak (AUP). The results can only be compared within a single plate using heat maps (Fig. 2b) due to substrate-dependent UV response, but trends can be observed between plates. A total of nine HTE plates and 864 reactions were performed. A large number of unsuccessful results were obtained: 487 reactions gave no quantifiable product formation, representing 56% of the overall 864 reactions conducted. These unfruitful results are still of major importance. As previously mentioned, predictive algorithms developed with experimental data from the literature have recently shown limited efficiency, partly due to the lack of negative results reported in the literature to train machine learning models.59–61
In the context of this project, the negative data will address this drawback and hopefully help produce more accurate predictions of the chances of success for nitration reactions. The large percentage of negative results obtained during this HTE campaign also confirms the challenging nature of nitration reactions. Thanks to this study, clear trends in the activity of nitrating agents or activating reagents can be observed across the plate, depending on the substrate (Fig. 2b and 3). Notably, only potassium persulfate is a versatile activating reagent, usually giving rise to better results than in the absence of an activating reagent, except for benzofuran, where no reactivity enhancement is observed. Nonetheless, less common activities were observed from other agents. For example, AIBN displayed significant activation performance on several scaffolds (picolinic acid, 2-bromopyridine, and 2-bromobenzofuran). All other activating reagents failed to show clear activity. Notably, silver carbonate and magnesium perchlorate hydrate did not allow better performance with carboxylic acid derivatives, thus showing no specific decarboxylation enhancement. Additionally, copper and palladium catalysts did not perform better with bromide derivatives, displaying no specific catalytic activity.
On the part of nitrating agents, as expected, nitric acid is a recurring adequate nitro source. Tert-butyl nitrite (tBuONO) also demonstrated good activity across the plates, although diminished compared to nitric acid. Interestingly, both iron and bismuth nitrate metallic salts showed significant activities on several substrates. Finally, N-nitrosaccharin (Sacc-NO2) exhibited high activity with most of the substrates, confirming the strong interest in newly developed N-nitro reagents.
As expected, N-nitrosuccinimide (Succ-NO2) showed no significant activity under thermal activation, since this reagent generally requires light activation.62 Other nitrate and nitrite sources (silver nitrate, sodium nitrite, sodium nitrate, potassium nitrite, and tetrabutylammonium nitrate), as well as nitronium tetrafluoroborate, showed no significant reactivity in the global results. Interestingly, in the absence of an activating reagent, a significant number of nitrating sources do not show diminished activity, thus indicating no requirement for activation. The overall profiles confirmed the high difference in reactivity between arenes and heteroarenes (Fig. 2b and 3b). Pyridine and benzofuran moieties display a disparity in reactivity toward the nitration reaction. This confirms the difficulty of designing versatile nitration conditions across different arenes and heteroarenes.
To confirm the obtained results, batch reactions were performed by selecting high-yielding entries for each compound (yields are displayed below each corresponding heat map in Fig. 2b). The obtained yields corroborate the challenge that comes with nitration reactions. The generally lower yields observed with pyridine derivatives confirm the reactivity trend in the nitration of aromatic rings: arenes > electron-rich heteroarenes ≫ electron-poor heteroarenes. Significantly lower yields are obtained from the ipso-functionalization of carboxylic acids with naphthyl and benzofuran moieties. For benzofuran, the ipso-functionalization of the bromo derivative also gave the best result on the overall plate. Ipso-functionalization of carboxylic acid, when applied to the pyridine moiety, gave the highest ratios. For arenes, electrophilic substitution seems to be the preferred transformation. However, it should be noted that for both naphthalene and benzofuran,63 regioisomers were observed. In the case of naphthalene, several dinitronaphthalene compounds were also formed,64 pointing out the lack of selectivity of this methodology.65 The selected high-yielding entries present a large variety of nitrating agents, highlighting the value of a broad HTE campaign studying nitration conditions depending on the reacting scaffolds. However, activating reagents are less diverse, with potassium persulfate being overrepresented. Notably, the high-yielding entries selected for compound isolation were mostly original reaction conditions (marked with a “c” in Fig. 2). Remarkably, the best-yielding entry for the benzofuran moiety—obtained from the reaction of 2-bromobenzofuran with bismuth nitrate pentahydrate in the presence of AIBN—was, to the best of our knowledge, never reported. Additionally, the reactions selected for isolation were all different, emphasizing the relevance of the conducted study. This large HTE campaign thus allowed for the identification of an unusual mixture of nitrating and activating reagents for the nitration of arenes and heteroarenes.
Next, selecting MorganFP-2-1024 as the substrates' featurization method, the evaluation of the ability of the classification models to accurately guess the reaction success on an unseen substrate was conducted, to mimic a real-world scenario where the reaction outcome on a new substrate would be sought to be determined before experiment. The leave-one-out strategy was used for this purpose: every experiment related to one specific substrate was taken out of the dataset, and classification models were trained on the remaining experiments. The left-out experiments were then used as the test set and classification accuracies were calculated for each model. This operation was repeated for each substrate.
Unfortunately, the accuracy of the predicting model varies significantly depending on the unseen substrate evaluated (Fig. 5a). As an example, when Gradient Boosting is used, the variations of the model are enormous depending on the left-out substrate (Fig. 5b). In addition, accuracies tend to diminish compared to the result obtained with the entire data set. Only bromonaphthalene, when not present in the training set and evaluated by the trained model, gives better accuracy than the one obtained with the entire data set. Benzofuran moieties give especially reduced accuracies. This result could outline the difficult prediction of heteroarene reactivity. From this hypothesis, we decided to only select one class of substrates to train the model, to hopefully obtain more accurate predictions. Electron-poor pyridine moieties displayed a significantly different reactivity than the other two studied scaffolds. They were thus selected and used to train an “electron-poor” model (Fig. 6). However, when compared to the results obtained with the entire data set, no significant improvement is observed.
![]() | ||
Fig. 6 Graph of the results expressed as the calculated accuracies depending the model and the data set examined. |
Overall, the capacity of the model to accurately predict product formation on an unseen molecule is limited. We hypothesize that this limitation comes from the limited set of evaluated molecules62,72 and incomplete feature-engineering of the reactive system. Indeed, nitration and activation agents were only categorically encoded in this study, providing no chemically-relevant information about these reagents, whereas some of their physico-chemical properties are likely to be important for reactivity prediction. The same goes for the substrates, although to a lesser extent, for which no descriptors stemming from electronic structure calculations were used. These machine-learning considerations and insufficient chemical diversity of substrates could both contribute to restraining the model from accurately classifying the reactivity of substrates in nitration reaction, thus explaining the diminished accuracy of the models on unseen molecules.
To further explore how negative results contribute to the performances of machine learning models, the experimental dataset produced in this study was split into different trainsets with varying proportions of successful/unsuccessful reactions. These proportions varied between 10% and 95% of successful experiments, and the accuracy of the best model is reported accordingly in Fig. 7. The same results in terms of balanced accuracy are reported in the ESI.†
For highly unbalanced trainsets, it is clear that the best models are obtained when the test set is split in the same way. Otherwise, the classification accuracy dramatically decreases. On the other hand, balanced trainsets, containing around 40 to 60% of successful reactions, always give reasonable accuracy which is much less dependent on the testset split. Because the testset repartition is typically unknown in real-world scenarios, this provides further evidence that reactions typically considered unworthy of publication actually are precious to developing robust data-driven models. A complementary study was performed using only two out of the three types of descriptors to train the algorithm (see Fig. 27 in the ESI†). The balanced accuracies obtained for each split revealed that the nitration agent is the most important descriptor to take into account to correctly predict the outcome of a reaction. On the other hand, ignoring the activation agent or the substrate itself does not significantly affect the model's performance. This might originate from strong similarities between the left-out substrate's reactivity and the training set's reactivity towards the same pairs of activation/nitration agents. Further exploration of these questions will be the object of a following study.
The high diversity of nitrating agents occurring in the best-yielding results proved the interest in performing such a large-spectrum HTE campaign. It also highlighted original reaction conditions. Over the 9 HTE plates, 5 high-yielding reaction conditions were previously unreported mixtures of nitrating and activating reagents. Finally, the high-quality and high-quantity data were used to develop a predictive tool relying on machine learning to evaluate the best nitration conditions depending on the targeted substrate. Although the model gave satisfying metrics when trained on the overall dataset, it revealed limited generalization capability on unseen substrates. A higher chemical diversity of targeted substrates and a more thorough featurization of the whole reactive system could allow for improved accuracy. Together, HTE and machine learning allowed for an extensive exploration of the nitration reaction, paving the way for a new methodology to address this challenging transformation.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d5dd00086f |
This journal is © The Royal Society of Chemistry 2025 |