Open Access Article
Maxime Ferrer
*,
Bowen Deng,
Javier E. Alfonso-Ramos
and
Thijs Stuyver
*
Ecole Nationale Supérieure de Chimie de Paris, Université PSL, CNRS, Institute of Chemistry for Life and Health Sciences, 75005 Paris, France. E-mail: maxime.ferrer@chimieparistech.psl.eu; thijs.stuyver@chimieparistech.psl.eu
First published on 20th October 2025
Plastics are essential in modern society, but their susceptibility to damage limits their lifespan and performance, and results in unsustainable waste production. Self-healing polymers based on thermally reversible Diels–Alder (DA) reactions offer a potential solution by enabling heating controlled repair through bond-breaking and reformation. However, discovering new suitable DA monomer combinations has largely relied on intuition and trial-and-error so far. Here, we present a hierarchical workflow that integrates machine learning (ML) with automated reaction profile calculations to efficiently screen DA reactions for self-healing polymer applications. Using our in-house TS-tools software, we generate high-throughput profiles at the semi-empirical xTB level. Refining only a small fraction with DFT, we are able to train a robust ML model that predicts reaction characteristics with excellent accuracy. Adding a graph-based ML model to the workflow for pre-screening enables expansion to reaction spaces of hundreds of thousands of reactions, at a marginal cost. We first leverage our models to screen a comprehensive reaction space of synthetic diene–dienophile pairs, and subsequently use them to mine a database of commercially available natural products. Overall, this hybrid ML-computational chemistry approach enables data-efficient discovery of thermally responsive DA reactions, advancing the rational design of self-healing polymers with tunable properties.
Self-healing materials can be classified into two main categories: autonomous and stimuli-responsive systems.7 As the name suggests, autonomous systems comprise materials that release a healing agent automatically upon detection of mechanical damage, i.e., no external intervention is required.11,12 In contrast, stimuli-responsive materials require an external factor to initiate the healing process.13,14 These systems typically rely on bond-breaking and bond-(re)forming mechanisms, which require well-characterized, reversible reactions that can be efficiently controlled through external stimuli, such as temperature variations. Given its practicality and broad applicability, thermal activation is particularly advantageous, making heat-induced reversible reactions a highly attractive approach for self-healing plastics.7,15
Because of their inherently tunable thermodynamic and kinetic properties, Diels–Alder (DA) reactions in particular have been extensively investigated as a potential candidate for the development of thermally responsive self-healing polymers.16,17 The DA reaction is classified as a [4 + 2] cycloaddition reaction.18,19 It describes the concerted reaction between two unsaturated molecules, referred to as diene and dienophile (Fig. 1).
![]() | ||
| Fig. 1 Schematic representation of a prototypical self-healing polymer system based on a DA moiety.20 The wavy bonds indicate the attachment points to the polymer backbone; TS stands for transition state. | ||
What makes the DA reaction particularly appealing for self-healing material applications is that the carbon–carbon bonds formed during the reaction are generally weaker than the other bonds within the system.21–23 These weaker and more flexible bonds are generally the first to be affected by mechanical deformation, i.e., when stress is applied. However, for reversible DA reactions, these bonds can be re-formed, enabling the restoration of the polymer network.
Up to this point, the development of self-healing polymers has remained, by and large, a serendipitous process, whereby new suitable DA monomer combinations are discovered primarily through intuition, in combination with a trial-and-error approach. To enable rational design, relationships between monomer chemical structure and macroscopic properties of the plastic need to be established. While the reaction characteristics of the DA monomers do not exclusively determine the properties of the resulting self-healing polymer (e.g., the backbone, as well as the stoichiometric ratios, and their respective roles in determining the glass transition temperature, also play a major role),21 they clearly are essential design parameters that can be used to filter out unfavorable combinations, and identify promising ones for further investigation.
From a reactivity standpoint, a suitable DA reaction for thermally responsive self-healing applications must be (slightly) exothermic, or more precisely, must yield a product that is thermally unstable. At room temperature, the equilibrium should lie toward the product side, yet heating the system must allow the reaction to reverse—regenerating the precursors via the retro-DA reaction. This implies that the entropic contribution to the free energy must be sufficient to counteract the enthalpic drive toward product formation at elevated temperatures. Beyond thermodynamics, kinetics also play a crucial role: the forward and reverse activation barriers determine the rate and temperature at which free monomers can be (re)generated at the site of damage, as well as the conditions under which repolymerization—and thus self-healing—can occur.
It should be noted that the ability to fine-tune the healing temperature through reaction barrier modulation is highly desirable, as ideal operating temperatures are strongly dependent on the use case of the specific self-healing plastic. For daily-life applications, e.g., soft robotics, low-temperature healing may be preferable from a safety perspective,24 whereas plastics that may get exposed to more extreme conditions, e.g., in aerospace applications, will ideally be more robust, i.e., they should be able to retain their full integrity up to a higher temperature.25
In principle, the reactive properties of a given diene–dienophile pair can be simulated with the help of quantum chemical methods, such as Density Functional Theory (DFT).21,26,27 However, locating transition states (TS) on potential energy surfaces (PES) is a non-trivial task, and achieving accurate results that align with experimental observations necessitates the use of high-level computational methods, which incur a significant computational cost. As such, traditional DFT studies on DA reactivity have focused primarily on a limited number of model reactions, from which qualitative trends are extracted.21 Systematic exploration of DA reaction libraries, especially in the context of self-healing polymers, has consequently largely remained out of reach.
The advent of machine learning (ML) has made the screening of large chemical reaction spaces feasible in principle, and this has recently enabled the development of protocols for high-throughput reaction discovery.28 In general terms, two main families of ML models for reactivity prediction can be distinguished: graph-based and geometry-based ones. The former are extremely fast, as they can generate predictions based on a simple reaction SMILES input, focusing only on connectivity, without the need for any conformational search.29–31 This means that predictions can be performed in milliseconds. While graph-based models can typically reach accuracies up to a couple of kcal mol−1, they can, in principle, be outperformed by models that operate on geometries.32–34 Given enough training data, the latter can reach accuracies close to, or even below, chemical accuracy (±1 kcal mol−1) with respect to the reference method, though at a significant computational cost, since accurate geometries need to be determined as input first.35
Effectively applying ML to explore chemical reaction spaces relies heavily on the availability of high-quality training data. Training data is typically generated with the help of DFT as well, and hence, the same issues as mentioned above are still encountered. Most reactivity screening studies in the literature so far indicate that several thousand reaction profiles are usually needed to build an accurate model, which corresponds to several million CPU hours of compute time.36–46
Here, we propose a synergistic reaction screening approach, combining hierarchical ML-based reaction filtering with robust, and ultra-fast, reaction profile computation, facilitated by our in-house developed software tool for TS and reaction profile computation, TS-tools47 (see Fig. 2). Focusing on a synthetically realistic reaction space of DA reactions, we demonstrate how the TS-tools package can be used to compute tens of thousands of reaction profiles at semi-empirical GFN2-xTB level of theory (hereafter referred to as xTB), on the fly, at a rate of over 100 reactions per hour on a single 72 CPU core workstation. Refining a small fraction of the so-generated reaction profiles at – a thoroughly benchmarked – DFT level subsequently enables the training of an (in-house designed) deep learning model, operating on the TS-tools generated xTB profiles (Fig. 4). The resulting model – DeepReaction – can reproduce the DFT quality reaction characteristics at near chemical accuracy.
![]() | ||
| Fig. 2 A schematic representation of the proposed hierarchical workflow for data-efficient DA reaction screening for self-healing polymer applications. | ||
Finally, we demonstrate how a rapid (yet less accurate and robust) secondary graph-based ML model can subsequently be used to expand the reaction search space beyond its original limits at marginal additional cost, by identifying promising patches of reaction space through reaction SMILES analysis, which can subsequently be refined and validated with more expensive methods.
By combining different model architectures, leveraging their respective strengths, into a hierarchical model, we obtain an extremely data-efficient QM-augmented ML workflow28 for DA reaction discovery, that can potentially be expanded towards the discovery of other types of reactions as well.
To underscore the realism of the diene and dienophile libraries generated in this manner, synthetic accessibility (SA) scores were computed for all compounds, and as can be observed from (Fig. 3C), most fall in the range 1–4, and none exhibit scores above 6, indicating that they should be readily synthesizable.
By systematically combining components, we generated 7208 unique diene–dienophile pairs. Considering all potential regio- and stereoisomers, this led to a total of 28
560 unique reaction SMILES.37,50
As indicated in the Introduction, at xTB level-of-theory, TS-tools can process reactions at an almost negligible computational cost: 23
327 reaction profiles were successfully computed (success rate of more than 81%) on a single 72 CPU core workstation in 9 days, totaling less than 15
000 CPU hours in overall compute time, i.e., less than 1 h per reaction profile. Benchmarking the xTB results against both high-level computational, as well as experimental, datasets (consisting of 19 and 100 activation energy values, respectively) reveals – as expected for semi-empirical calculations – significant quantitative disagreement for the barrier heights. More specifically, the MAE for the experimental dataset amounts to 9.11 kcal mol−1 and 13.83 kcal mol−1 for the computational benchmarking dataset. Nevertheless, it is important to note that the main reactivity trends are nicely recovered (R2 of 0.80 and 0.89 for the respective datasets), implying that the xTB labels nonetheless have significant predictive value.
Next to the xTB dataset covering all the reactions present in the full reaction space, we also computed a DFT refinement on a subset of 1580 reactions, which were sampled based on a diversity criterion (see Section S4 and Fig. S6 of the SI for additional details).28,52 We used the M06-2X functional53 (in combination with the def2-TZVP basis set54) in our calculations,55,56 both because of its robust performance on the experimental and computational benchmarking datasets (R2 of 0.90 and 0.97 respectively – though with an apparent tendency to slightly overestimate the barriers systematically cf. Sections S2.1 and S2.2), and because M06-2X performs exceptionally well in most kinetics benchmarking studies, particularly those on cycloaddition reactions.21,57 Overall, all DFT refinements took close to 200
000 CPU hours in total, implying that refining xTB reaction profiles across the entire reaction space would have cost on the order of 3
000
000 CPU hours.
A limited hyperparameter search was performed for this model; see SI Section S6 for details, as well as an overview of the final hyperparameters selected. A 10-fold cross-validation was used to estimate the performance of the model, with an 80/10/10 split for the training, validation, and test sets, in each fold. To avoid data leakage, all reactions involving a given diene–dienophile pair were kept grouped in a single subset in every fold.
In the first instance, the model was trained on the small, DFT-level dataset. Additionally, we also trained a second version of the model on the larger 23
327 reaction dataset, where the (close to) DFT-quality labels, outputted by the geometry-based model, were used as the target. Details regarding hyperparameter search and selection can be found in Section S7 of the SI. The final performance of the model was again evaluated through 10-fold cross-validation, and similar precautions to avoid data leakage as for the geometry-based model were taken.
327 xTB level reaction profiles. Corresponding plots for the 1580 DFT-refined reactions can be found in Fig. S5 in the SI; similar distributions are obtained at both levels of theory, albeit with a shifted mean. In panel F, the correlation between the activation and reaction free energy is presented. With an R2 value of 0.59, it is clear that the Bell–Evans–Polanyi principle60,61 is only partially recovered across the considered reaction space (for the DFT refined reactions, the R2 value amounts to 0.58). The lack of a strong correlation suggests that kinetics and thermodynamics can be tuned fairly independently for DA reactions. As such, it should be possible to identify reactions that are thermoreversible and initiate self-healing over a broad temperature range within the designed chemical reaction space.
Another interesting preliminary observation is that computed reaction entropies,
, i.e., the entropy differences (under standard conditions) between reactants and product, are distributed in a very narrow range at both xTB and DFT level of theory. This implies that fairly little control can be exerted over the entropy through molecular modulation across the defined reaction space. This is significant, since reaction entropy is one of the main determinants of whether a reaction will be thermoreversible or not, as it indicates the susceptibility of ΔrG towards temperature changes.
Of course, the reaction entropy in an actual polymer environment will not be exactly the same as in the gas-phase environment assumed in our DFT simulations, and neither will this quantity stay perfectly constant across typical temperature ranges associated with self-healing plastics. Nonetheless, one can use the average
to define a very crude rule-of-thumb to distinguish DA reactions that stand a reasonable chance of being thermoreversible from those that are highly unlikely to exhibit this behavior. A detailed discussion is provided in Section S5 of the SI. Our back-of-the-envelope analysis suggests that, focusing on temperature changes below 200 °C, reactions that are exothermic by more than 15–20 kcal mol−1 ought to stand no realistic chance of being thermoreversible.
In the remainder of our analysis, this rule-of-thumb will be used as a guiding principle to gauge what fraction of the DA reactions screened are of potential interest for self-healing polymer applications.
The underlying reason for the significantly superior accuracy of our model is the fact that we also provide the crude TS geometry, as well as the xTB level targets, as model inputs (see Section S6 for a concise ablation study). Generating these inputs doesn't come for free, of course, but since our in-house developed TS-tools code enables computation of full xTB-level reaction profiles at a cost of less than 1 CPU hour per reaction, we considered this a relatively small price to pay for the markedly enhanced accuracy. Especially when keeping in mind that computing an additional DFT quality training point – to expand the dataset and hence improve model accuracy of EquiReact/ChemProp – is two to three orders of magnitude more expensive on average (vide supra).
With our accurate geometry-based model, we predicted DFT quality ΔG‡ and ΔrG across our designated reaction space. In Fig. 5A, summarizing plots of the resulting data distribution are provided.
Interestingly, we observe that reaction asynchronicity is a major driver of the observed deviations from the Bell–Evans–Polanyi principle, in line with previous, qualitative, observations.63 Indeed, focusing exclusively on the most synchronous reactions, i.e., those for which the bond distance ratio for the formed bonds in the TS is confined between 0.95 and 1.05, the correlation between ΔG‡ and ΔrG improves markedly, achieving an R2 of 0.70 (Fig. S12 in the SI).
Additionally, we took a closer look at the predictive power of conceptual DFT reactivity descriptors (M06-2X/def2-TZVP).64–66 In Fig. 5B, a colormap of ΔG‡ as a function of the dienophile electrophilicity (
; μ = (εLUMO + εHOMO)/2; η = (εLUMO − εHOMO)/2) and diene nucleophilicity (N = εHOMO − εHOMO,ref; the reference being tetracyanoethylene21,67) is plotted. From this plot, one can observe, in line with previous observations derived from a handful of model reactions,21 that the electrophilicity of the dienophile appears to be correlated with the height of the reaction barrier. In contrast, no clear correlation is observed between the nucleophilicity of the diene and the activation barrier.
Interestingly, for ΔrG, diametrically opposite conclusions can be drawn (cf. Fig. S13 in the SI for the corresponding plot). For this quantity, an (inverse) correlation with the nucleophilicity of the diene is observed, and no clear relationship with the dienophile electrophilicity emerges. In line with the conclusions from the Bell–Evans–Polanyi plot, we find that the reactions with the lowest barrier, i.e., those in the upper right corner of Fig. 5B, are often times slightly endothermic, and many promising reactions that have optimal reactive properties for self-healing polymer applications correspond to diene–dienophile combinations that do not exhibit maximized nucleo- and electrophilicity values, respectively. This underscores the merit of our ML based approach to identify suitable reactant combinations.
Finally, we also aimed to validate the rule of thumb that the formation of endo-stereomers tends to be kinetically preferred (due to favorable through-space orbital interactions between the approaching reactants), whereas the formation of exo-stereomers tends to be thermodynamically preferred (due to more favorable steric interactions in the product). In Fig. 5C, a confusion matrix is shown in which every diene–dienophile pair is classified based on these two criteria, i.e., whether the reaction pathway with the lowest activation energy is endo
, and whether the most exothermic pathway is exo (ΔrGexo < ΔrGendo). It can be observed that this heuristic rule has only moderate predictive power: both criteria are fulfilled a mere 42% of the time; 6% of diene–dienophile combinations violate both criteria. Note that the confusion matrix obtained for the entire search space, based on the predicted values, matches almost perfectly the matrix obtained exclusively for the subset explicitly computed at the DFT level (Fig. S14).
480 reactions in total, exhibits a ΔrG between 0 and −20 kcal mol−1. The corresponding distribution of ΔG‡ is shown in Fig. 6A. Clearly, the range of activation free energies is broad, with the bulk of them falling between 30 and 40 kcal mol−1. To zoom in on the discovery of diene–dienophile pairs that could result in polymers that self-heal at relatively low temperatures, we imposed an initial cut-off at 30 kcal mol−1. This corresponds to the computed ΔG‡ of the furane–maleimide pair, the diene–dienophile pair for which self-healing properties were first observed, and for which the retro-DA reaction was triggered at around 150 °C.20,21 2892 reactions are below this threshold, corresponding to 1160 unique diene–dienophile pairs, representing 16% of the total search space.
Refining 100 randomly sampled examples from the set of retained reactions at DFT level confirms the excellent performance of our model, even when applied to reactions towards the edge of the training data distribution: for ΔG‡, an MAE of 1.36 kcal mol−1 and an RMSE of 2.04 kcal mol−1 is obtained between computed and predicted values; for ΔrG, an MAE of 2.06 kcal mol−1 and an RMSE of 2.80 kcal mol−1 is achieved.
Reducing the activation energy cut-off threshold rapidly reduces the number of retained reactions: at 25 kcal mol−1, the number of reactions is reduced to 1357 (corresponding to 551 pairs); at 20 kcal mol−1, only 476 reactions, i.e., 209 pairs, are retained. This indicates that only a tiny fraction of the reaction search space involves diene–dienophile pairs that could give rise to polymers able to self-heal at relatively low temperatures.
To increase the odds of synthetic feasibility further, we subsequently applied an additional round of filtering to the reactions that fell within the suitable activation and reaction energy windows. Specifically, we removed dienes or dienophiles that could plausibly undergo ketone–enol tautomerism, that contained potentially hydrolysable motifs, or that exhibited competitive intramolecular DA reactions (see Fig. 6B; cf. Section S.5.3 for more information). We emphasize that these filters were applied in a deliberately conservative manner: in practice, the extent of tautomerization or hydrolytic instability is to some extent context- and substitution-dependent, and not all of the excluded motifs would necessarily be problematic under experimental conditions. However, by discarding the gray-zone cases, we ensured that the remaining subset represents a stringent lower bound of reactions that are unlikely to pose issues in a synthetic setting. After this filtering step, 1288 reactions survive; a sample of these promising candidate reactions is shown in Fig. 6C.
An alternative version of this model was developed as well, where the more expansive 23
327 reaction dataset, with (close to) DFT quality labels predicted by the DeepReaction model, was used for training. On this extended dataset, the model achieves an (interpolation) accuracy of 1.05 and 1.40 kcal mol−1, respectively. This result suggests that the graph-based ChemProp model can, in principle, reach accuracies similar to the ones of our geometry-based model, despite not having access to the geometric details of the reaction pathways. The main difference between the model architectures, however, lies in the data efficiency: ChemProp needs around an order of magnitude more – computationally expensive – reference data to reach a similar accuracy as DeepReaction (see Fig. S15 in the SI for the respective learning curves).
It is important to underscore that, in practice, the second version of the ChemProp model developed is not necessarily going to be more accurate than the first one, since the labels for the former inherently contain noise due to the prediction error of DeepReaction. At the same time, one might expect the latter to exhibit a better generalization potential, as it is trained on a broader reaction space. To test whether dataset expansion based on predicted labels can indeed be a viable tactic to increase model robustness, we will compare the performance of both ChemProp versions in the final sections below.
889 reactions were generated, by exhaustively combining 87 new dienes with the previously considered dienophiles (see Fig. 7A for an overview of the new dienes probed).
Since the original synthetic reaction space, enumerated and analyzed before, contained fairly few diene–dienophile pairs that could potentially give rise to polymers that self-heal at mild temperatures, we focus in the first instance on discovering more of those. As such, we applied (the first version of) our ChemProp model, and selected all the reactions in the newly generated space predicted to be exothermic by up to 20 kcal mol−1, and to exhibit barriers below 25 kcal mol−1. For the resulting 197 reactions, a geometry-based validation, followed by DFT confirmation, was performed. Comparing the computed labels with the ones predicted by the different models, immediately reveals that the accuracies of all models deteriorate notably – which is not entirely surprising, as we are now unequivocally extrapolating well beyond the original training domain (cf. Section S12 of the SI). For DeepReaction, the deterioration in performance is limited in absolute terms (MAE and RMSE for ΔG‡ amount to 1.9 and 2.5 kcal mol−1, respectively; for ΔrG, 2.8 and 3.6 kcal mol−1 are obtained). The ChemProp models, on the other hand, become markedly less reliable in extrapolation mode (MAEs for both targets fall in a range between 4 and 5.5 kcal mol−1).
It is nonetheless noteworthy that the second version of the ChemProp model, i.e., the one trained on all 23
327 labelled data points, fares significantly better than the first version. For ΔrG, the improvement in MAE amounts to 1 kcal mol−1, for ΔG‡ 0.5 kcal mol−1. The superior performance of the second version is however most accentuated in terms of the correlation coefficient between predicted and computed ΔG‡: whereas the first model fails to capture any fine-grained trend and simply clusters the barriers for all selected reactions around 20 kcal mol−1 (R2 = 0.03), the second ChemProp model does manage to differentiate between relatively high- and low-barriers within the selection window reasonably well (R2 = 0.48).
These observations nicely confirm our expectation regarding the relative generalization potential of these models, and demonstrate – at least for this first test case – that the gains from expanding the training set size trump the uncertainty induced by using (noisy) DeepReaction predicted labels.
In total, 107 reactions were validated with DFT to adhere to the criteria indicated above. Some representative examples are shown in Fig. 7B. Clearly, we can expand the pool of plausible thermoreversible DA reactions with low barriers with our hierarchical approach, and this at a marginal computational cost.
615 commercially available natural products, for suitable DA reactions for self-healing polymer applications. A full overview of the filtering steps taken to obtain a list of plausible dienes and dienophiles can be found in Section S12 of the SI. In total, 84 potential dienes and 423 potential dienophiles are identified, which leads to 308
261 DA reactions in total. We emphasize that, in the present context, ‘natural products' are primarily used to probe chemical diversity, not to imply direct feasibility as biomass-derived feedstocks; economic and production-scale considerations are a separate matter that would require further scrutiny.
As a first step, we applied the same strategy as above, i.e., the first ChemProp model was used to pre-select promising reactions, after which DeepReaction validations, and DFT confirmations, were performed (see Section S12 in the SI for a detailed discussion). Similar trends are obtained as before, i.e., DeepReaction retains its robustness the best, still reproducing the DFT labels with reasonable accuracy, and the second ChemProp model significantly outperforms the first one. Particularly noteworthy here is the extremely poor extrapolation performance of the first ChemProp version: for ΔG‡, its MAE amounts to a whopping 11 kcal mol−1 (almost 4 kcal mol−1 worse than the second version); for ΔrG, the MAE amounts to a slightly less dramatic 5.3 kcal mol−1 (which is still almost 1 kcal mol−1 worse than the second model).
Based on these results, we decided to perform the final pre-screening of the ZINC NP database with the second ChemProp model. Across the enumerated reaction space, 4245 reactions are identified as potentially suitable. Grouping these reactions by diene–dienophile pair, and selecting for each pair the reaction with the lowest predicted activation energy, 1277 reactions were selected for xTB reaction profile computation, followed by DeepReaction refinement. Based on these final predictions, the 32 most promising reactions were selected for a final DFT validation. Gratifyingly, all these reactions are computed to adhere to the first criterion listed above, i.e., ΔrG between −20 and 0 kcal mol−1; the second criteria, i.e. ΔG‡ < 30 kcal mol−1, is respected by 28 of them, with the highest computed barrier amounting to 32.95 kcal mol−1 (see Fig. 7C for some representative examples of validated reactions).
It is noteworthy that even for the reactions, emerging as the most promising from our analysis, the computed barriers are all clustered within the 25–30 kcal mol−1 window. This suggests that the potential to discover purely bio-based reactions for ultra-low temperature self-healing plastics may be limited, and that synthetic reaction spaces like the ones outlined above provide more opportunities to identify DA reactions with optimal reactive properties.
Nonetheless, the analysis above clearly demonstrates that, in general, our hierarchical workflow also enables identification of promising bio-based candidate reactions for self-healing polymer applications, even though we did not set up our models explicitly for this purpose. Due to the limited generalization ability of ChemProp, plenty of “false positive” reactions tend to survive the initial filtering step this time around, but the secondary filtering with our geometry-based model readily identifies those, so that computational resources can be focused on DFT validation of the reactions that are the most likely to have desirable properties.
Supplementary information is available. See DOI: https://doi.org/10.1039/d5dd00340g.
| This journal is © The Royal Society of Chemistry 2025 |