Open Access Article
Alan Kai Hassen
*ad,
Helen Lai
b,
Samuel Genheden
c,
Mike Preuss
d and
Djork-Arné Clevert
a
aMachine Learning Research, Pfizer Research and Development, Berlin, Germany. E-mail: AlanKai.Hassen@pfizer.com
bMolecular AI, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK
cMolecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
dLeiden Institute of Advanced Computer Science, Leiden University, Leiden, The Netherlands
First published on 25th February 2026
Computer-aided synthesis planning aims to identify viable synthetic routes from a target compound to readily available building blocks by iteratively decomposing molecules into smaller precursors. Self-play search algorithms, trained with simulated experience, reach state-of-the-art performance. However, these methods typically plan in the molecular rather than the reaction space, leading to redundant or near-duplicate reaction outcomes in the search tree. In this work, we introduce a reaction-centric planning approach that measures the novelty of proposed reactions to analyze the synthesis planning search problem, constraining the search problem to genuinely unexplored disconnection ideas, i.e., unique ways of decomposing molecules using reactions. Our results show that the overall synthesis planning search space is much smaller than expected due to the absence of diverse disconnection ideas within the underlying template-based retrosynthesis model. Surprisingly, we also find that, under a reasonable time budget of less than an hour, online search algorithms outperform state-of-the-art self-play methods and are more robust to environmental changes, such as minor modifications to the available purchasable building blocks. Finally, we show that the diversity of the synthesis route solution space saturates when combining the results of different search algorithms, highlighting the importance of the single-step model in providing novel and chemically valid disconnections.
Modern CASP systems6–8 consist of two core components:3 first, a single-step model that encapsulates the backward reaction logic of how to disconnect a molecule into potential reactants with a neural network in a supervised learning task (e.g. ref. 9–11). Second, a search algorithm whose objective is to find a synthesis route to building blocks within the possible search space provided by the single-step model. Inevitably, these search algorithms must balance exploration and exploitation of the potential state space, as the space is too big for an exhaustive search, given that real-world synthesis routes can involve multiple steps and that many possible disconnection alternatives can be added to the search tree for each molecule. There are currently three major research branches that explore the challenge of estimating the expected reward of finding a synthesis route in a search tree for a molecule or for a reaction:
The pioneering approaches in the field rely on online value estimation in the search tree, mirroring the initialization phase of earlier AlphaGo approaches that were pre-trained with human expert games,12 for example with Monte-Carlo Tree Search (MCTS) using policy guidance,1 Depth-First Proof-Number Search,13 holistic hyper-graph exploration strategy,14 or A*-type best-first search of Retro*-0.15 However, these approaches generally underperform in comparison to methods that use offline supervised learning heuristics, which learn to estimate the expected rewards from historical synthesis data, or self-play methods, which approximate the expected rewards through simulated experiences (originating from AlphaGo variants12,16). Prominent examples that rely on historical synthesis routes to learn a guiding policy include Retro*15 or RetroGraph.17 Self-play, achieving state-of-the-art performances in the field, is applied by using simulated experience to train a value network,18 experience-guided MCTS to learn the success of using reactions19 or experience-guided Retro* to learn the success of decomposing a molecule.20 Moreover, self-play methods are used to apply an actor-critic approach,21 re-rank single-step model predictions22 or adjust the underlying reaction templates of the single-step model.23
This divide between well-performing self-play methods and less-effective online value estimation methods is a serious problem for the field of CASP because, unlike Chess or Go, CASP is not a stationary environment, making learned reward estimations less transferable and often requiring constant retraining:
• The underlying building blocks that constitute the winning states of the game can change dramatically depending on the application. For example, modern CASP tasks may rely on a specific small building block set,4 search for specific intermediate molecules24,25 or optimize multiple route characteristics simultaneously.26
• The reaction suggestions of the single-step model, which are used as the basis of the search tree, can change drastically depending on the model architecture27 or based on the reaction data that is used.28,29
• The historical reactions or simulated experiences used to train the tree policy models are generated by “playing” the game of retrosynthesis against a set of predefined molecules taken from a specific chemical space. However, as real-world reaction and molecule modalities are continually changing,30 this learned chemical space may not cover the target molecule for which synthesis planning is conducted.
To address these problems, we use reaction diversity as a guiding principle in online synthesis planning algorithms (see Fig. 1), prioritizing the exploration of novel reaction disconnection ideas inspired by prior approaches that use reaction diversity in a greedy, best-first tree search.8,31 Here, similiar reactions are pruned to reduce the search space, guiding the tree search by minimizing the distance to a provided external reaction database of literature synthesis routes,31 or expand the diversity of disconnection ideas before filtering for validity.8 Our work, beyond improving diversity in candidate synthesis route solution spaces, enables an in-depth analysis of the synthesis planning search problem. With this:
• We show that the synthesis planning search problem is computationally much more straightforward than previously assumed when using template-based single-step models as the diversity of suggested reaction disconnections is limited, leading to an effective branching factor of roughly 3 after clustering.
• We demonstrate that online value estimation in synthesis planning can outperform or match self-play methods when sufficient single-step model calls are provided within a reasonable time budget, thus allowing adaptation to non-stationary environments (where we show self-play methods underperform to adapt).
• We highlight that our reaction diversity search yields a diverse set of synthesis routes, allowing a chemist to choose from a manageable solution set by avoiding redundant analog building blocks, and provide general guidelines for successful synthesis planning.
by recursively breaking down m into precursor reactants
until either all reactants belong to a (commercially) available building block set $ or the search budget (e.g. time, iterations) is exhausted. The state space S for this search is defined as the power set
of
:
is the space of all possible molecules and each state s ∈ S represents a set of molecules:The synthesis route sr is then formally defined as a mapping:
| sr(m) = (a1, a2, …, ak) |
transforming a molecule mi into its precursor set ri:| ai(mi) → ri. |
of retrosynthetic actions (a1, a2, …, ak), where each action ai transforms an intermediate molecule mi into its precursor set ri. sr is considered solved if, after applying all actions (a1, a2, …, ak) in order, the resulting set of molecules s is in the available building block set (s ⊆ $). A single-step retrosynthesis model, denoted by ssm, provides candidate precursor sets for a given molecule:
is the space of molecules,
is the space of possible reactant sets, and
represents reactant sets paired with a likelihood score. Given m, the model outputs:| ssm (m) = {((ri)1, p1), …, ((ri)n, pn)} |
is a set of reactants forming m, and pk ∈ [0, 1] is the corresponding reaction disconnection probability. Whenever more than one candidate set of reactants is returned for a given molecule, the primary task of synthesis planning is to pick the disconnection that will most likely lead to a solved synthesis route. From a reinforcement learning perspective, the core challenge of the search algorithm is to learn or estimate the expected reward Q(s, a) given a search tree state s and a possible reaction alternative a in state-based tree representation, encapsulating the entire search problem of a possible synthesis route with all its molecules in a state s:1Synthesis planning approaches can estimate Q(s, a) or V(m) “online” while running a search algorithm (e.g. ref. 1), learn a “heuristic” via supervised learning based on historical synthesis routes (e.g. ref. 15), or use “simulated experience” by repeatedly running synthesis planning against a predefined set of molecules, recording the successful synthesis pathways, and iteratively refining either Q(s, a)19 or V(m).22
To address this, we propose quantifying the novelty N(a) of each reaction, measuring how much it differs from other candidate reactions instead of filtering them (e.g. ref. 31). We then incorporate N(a) into the classical exploration–exploitation trade-off in two distinct ways:
We define N(a) as the minimal distance between the current reaction aj and all previously explored reactions ai:
is the set of resulting clusters. We incorporate ssmclustered(m) into both the often-used best-first search algorithm Retro*-0,15 without neural guiding policy, and into MCTS.1
P, and TPSA larger than the mean of the respective values in USPTO-190. Unless otherwise noted, we use the eMolecules stock37 (22,876,046 molecules), consistent with prior works.15,22 Any deviations from these default settings are explicitly stated throughout this work. We run all experiments embarrassingly parallel on an HPC cluster using only CPUs. The only exceptions are experiments regarding Dual Value Networks,22 which we train and run inference on multiple Nvidia H100 GPUs.
| Source | Search structure | Tree policy | Algorithm | ChEMBL-1000 (%) | GDB17-1000 (%) | USPTO-190 (%) |
|---|---|---|---|---|---|---|
| 15 | Search tree | Online | Retro*-0 | 75.10 | 7.50 | 79.47 |
| 15 | Search tree | Heuristic | Retro* | 76.20 | 9.50 | 85.79 |
| 20 | Search tree | Self-play | Retro* + -0 | 81.10 | 15.00 | 96.32 |
| 20 | Search tree | Self-play | Retro*+ | 81.80 | 15.40 | 90.53 |
| 22 | Search tree | Self-play | PDVN + Retro*-0 | 83.50 | 26.90 | 98.95 |
| Our | Search tree | Online | Cluster-Retro*-0 | 70.70 | 7.30 | 71.58 |
| Our | Search tree | Online | Cluster-MCTS | 76.60 | 12.30 | 75.26 |
| Our | Search tree | Online | Distance-MCTS | 85.30 | 25.90 | 95.26 |
| 17 | Reaction network | Heuristic | RetroGraph | 85.20 | 21.50 | 99.47 |
| 22 | Reaction network | Self-play | PDVN + RetroGraph | 86.00 | 37.10 | 99.47 |
15 as baselines, but increased the number of iterations to 25
000 for all Retro* variants (default & cluster) to match the single-step model calls used by the MCTS-based methods. We then compared these results against the self-play performance of Dual Value Networks.22
Fig. 2 shows how varying the number of single-step model calls affects the synthesis planning success rate for different search algorithms. There is a clear relationship between more single-step model calls and a better solved rate for all three datasets and all online search algorithms. These results indicate that self-play variants (e.g. ref. 22) learn to find a synthesis route faster but do not perform better, unlike their motivating AlphaGo variants12,16 that outperform online search algorithms. Furthermore, the search times for online search algorithms are surprisingly short, even when using cheap CPU inference (see Fig. S1). Finally, the search space for each dataset seems to have a natural performance limit that all well-performing algorithms converge upon over time. When looking at the different algorithms, the difference between MCTS and Distance-MCTS is relatively small, even though Distance-MCTS performs slightly better. Both algorithms outperform self-play on ChEMBL-1000 and reach comparable performance on USPTO-190 and GDB17-1000. Clustering the ssm reactions decreases the performance for all algorithms by ∼20% in total, compared to the non-clustered variant, yet it retains a surprisingly high performance given that the effective search space is drastically reduced in the reaction space, from an average of 25.37 alternatives added per expansion call to 3.02 for ChEMBL-1000 and MCTS (Retro* on ChEMBL-1000: 22.54 → 2.80). This relatively low drop in performance on a much smaller clustered search space shows that the average search space for a synthesis planning problem is, in principle, much smaller than previously expected, as the diversity in suggested distinct reaction ideas is lacking, and the search space mainly consists of reactions with slightly varying reactant outcomes. Note that the width usually never reaches the ideal 50 alternatives, as direct duplicates and erroneous reactions are removed from the predictions. Finally, picking the single-step model ssm suggested alternative in a best-first search is surprisingly effective for Retro*-0 on the USPTO-190 benchmark dataset. Here, the 95% solved rate is reached within 6000 single-step model calls, and it matches previously reported state-of-the-art performance at the 20
000 mark. These results raise the question of how reliable the evaluation dataset is—and, consequently, the reported performance of search algorithms on it—if picking the best alternative suggested by the ssm can achieve these performance levels as quickly as our results indicate. Our results further indicate that USPTO-190 may be very much in-domain of the ssm as it is created from USPTO patent data15 and thus requires little search effort. These results are supported by the Retro*-0 performance on the GDB17-1000 dataset, where picking the best-first alternative does not work well because the enumerated molecular space is not as well known to the single-step model. In this setting, the search success rate increases only linearly after initially solving the more familiar molecules. To summarize, the computational search problem of synthesis planning is more straightforward than expected as a best-first search with enough single-step model calls is very competitive to self-play algorithms, and the effective search space after clustering is much smaller than the expected 5030.1
874 building blocks (0.0126%) from our initial stock dataset of 22
876
046, leaving 22
873
172 building blocks. As this amounts to only 0.0126% of the building blocks, the challenge of the search becomes to substitute these specific best-first building blocks within a large space of possible alternatives. This task could be challenging if building blocks without suitable analogs are removed or alternative reaction pathways that use entirely different building blocks must be found. We then repeat the prior experiment to measure the relationship between synthesis planning success and single-step model calls with this new building block set on USPTO-190, ChEMBL-1000 and GDB17-1000. Furthermore, we evaluate Dual Value Networks22 on this reduced set without retraining, meaning the algorithm remains pre-trained on the full eMolecules dataset, as this approach allows us to assess how well self-play methods generalize.
Surprisingly, removing 0.0126% of the best-first synthesis route building blocks leads to a large drop in synthesis planning performance for all datasets (see Fig. 3). Performance on USPTO-190 and GDB17-1000 decreases across all algorithms to less than half the initial best performance, while the drop in ChEMBL-1000 is considerably lower. In terms of algorithms, using a best-first algorithm with Retro*-0 outperforms self-play and all MCTS variants. The generalizability of self-play might be worse than expected, as these small changes in the search environment result in considerable performance drops, and the best-first search algorithm outperforms self-play in earlier iterations compared to the complete building block set. Given that we removed the building blocks of the best-first route, the task here is to find analog molecules that substitute the missing building blocks—a task that a best-first search can handle most effectively, as analogs should be highly ranked on the best-first exploration frontier. MCTS variants, however, tend to discourage the entire reaction pathway because the negative reward of the best-first route is backpropagated through the search tree and discourages the search for analogs once the original best-first route returns a negative reward. Consequently, the difference between Distance-MCTS and normal MCTS is rather small, whereas the default implementation of MCTS seems to perform slightly better as it has access to direct reaction analogs that might be discouraged by Distance-MCTS. All clustering variants that operate in reaction disconnection search space require that the newly found synthesis route differs from the best-first route; that is, the new route would need to use at least one reaction idea within the synthesis route that is not part of the best-first route, implying structurally different pathways for synthesizing the molecule beyond the best-first approach. Noteworthy, such a structurally different pathway that leads to different building blocks could be as simple as switching the order of molecule decompositions in a synthesis route or further decomposing missing building blocks (see SI Fig. S4 for an example). This usage of different reaction ideas appears not to be possible, as the synthesis route found rates decrease drastically compared to the complete building block set for USPTO-190 (Cluster-Retro*-0: 81.58% → 8.95%, Cluster-MCTS: 75.26% → 7.37%) and GDB17-1000 (Cluster-Retro*-0: 14.60% → 1.00%, Cluster-MCTS: 12.30% → 1.10%), where it seems that a specific best-first reaction route must be found. For ChEMBL-1000, however, finding alternative synthesis routes is possible (Cluster-Retro*-0: 77.60% → 39.10%, Cluster-MCTS: 76.60% → 35.00%), indicating that, in principle, there are structurally different ways of synthesizing these molecules. Notably, these algorithms are not just solving molecules that lie beyond the 500-iteration boundary for Retro*-0, as the solved rate is substantially higher than the increase in solved routes beyond 500 iterations in the first experiment (compare Fig. 2). To summarize, the ability to adapt to changes in the search environment is surprisingly limited. While a best-first search algorithm should, in principle, find solutions faster than MCTS and be more robust than a self-play algorithm, all algorithms struggle to find “meaningful” reaction alternatives when the ssm does not provide sufficiently diverse reaction disconnections, as no reaction pathway can be found that is not suggested by the ssm.
000 single-step model calls for the Retro* variants. We sample 100 molecules from the ChEMBL-1000 dataset as an evaluation dataset from the set of molecules for which all online search algorithms found a synthesis route in our first experiment. We focus exclusively on solved molecules here because we are interested in how the solution space changes once a valid solution can be found, rather than whether an algorithm can solve a molecule in the first place—hence, we exclude unsolved cases from this analysis. We first evaluate the average number of found synthesis routes across evaluated molecules during the search to measure the size of the overall solution space. Additionally, we examine the average shortest route found by each algorithm. While a high number of discovered synthesis routes may indicate broad exploration of the search space, many of these routes can be close variants differing by minor modifications through the use of analog building blocks instead of genuinely different reaction pathways. Thus, the total route count alone could be misleading. Therefore, we compute the average pair-wise distance33 between the top-100 returned routes across all molecules, measuring the necessary steps to transform one synthesis route into another. A higher average distance implies that the found set of routes is more diverse, aligning with our goal of offering chemists more than one principal route rather than minor variations of essentially the same route. We do not cluster the entire route space due to computational constraints and because each algorithm should ideally return a diverse set of alternatives within a manageable top-n.
All molecules remain solvable across all tested algorithms (see Fig. 4 and SI Table S3). Among these algorithms, the MCTS variants (default and distance-based) return the highest number of routes, averaging between 2700 and 2400. Retro*-0 and Cluster-Retro*-0 follow with around 1700 routes. The only outlier is Cluster-MCTS, which returns an average of roughly 260 routes. Nevertheless, every algorithm provides numerous routes to choose from in its respective solution space. Regarding the shortest route found per target, most algorithms produce routes that average three to four reactions. However, novelty-based algorithms tend to produce slightly longer synthesis routes, where especially Cluster-Retro*-0 returns routes that are on average more than one reaction longer. Looking more closely at the diversity of these routes, we observe that non-novelty-based algorithms tend to produce only minor variations, reflected in low distances within the top-100 returned routes and consequently suggesting a less diverse set of solutions. In contrast, Distance-MCTS produces slightly more distinct alternatives in the top-100. Finally, algorithms that rely on clustering to search in the reaction space (Cluster-Retro*-0 and Cluster-MCTS) return the most structurally distinct synthesis routes, substantially increasing the mean route distance.
In a second step, we analyze the building blocks that the respective algorithms use as end-points for their synthesis routes, as these define the possible fragmentations of the target molecules. For this purpose, we compare the set of unique building blocks found for each molecule by each algorithm against the molecule results of all other algorithms (see Fig. 5 and SI Table S4). We first assess the Building Block Coverage, defined as the percentage of a single algorithm's unique building blocks for a given molecule relative to the total set of unique building blocks found by all algorithms for that same molecule. Here, non-clustering MCTS-based methods cover the most building block space, with an average coverage of around 55%. In contrast, best-first search strategies perform worse on average (31% to 39%), and Cluster-MCTS performs the worst with roughly only 18%. These results align with the route analysis, supporting the observation that clustering methods actively remove analog building blocks from the search space. We also measured the contribution of Unique Building Blocks, which are specific to a single algorithm. Generally, most algorithms only contribute a small fraction of unique building blocks, averaging around 10% to 13%. The cluster-based methods yield the fewest unique building blocks with roughly 2% for Cluster-MCTS and 9% for Cluster-Retro*. We can also compare the building block space to the search setting in Experiment 2, where the “best-first” building blocks were removed. For each algorithm, around 36% to 40% of all unique building blocks for each molecule became unavailable, thereby reducing the average percentage of previously solved routes per molecule to below 2% for all algorithms. This decrease demonstrates that the majority of the routes identified in the standard-setting rely heavily on these “best-first” building blocks. At the molecular level, this filtering results in a vast decrease in the number of solved molecules. Solved rates drop from initially finding a synthesis route for all molecules to solving roughly 50% of the molecules for non-clustered algorithms, and only around 15% for the clustered algorithms. This disparity strongly highlights the importance of analog building blocks for successful synthesis planning in this setting, as these are precisely what the clustering methods are designed to prune. Finally, we compare the rate of molecules for which a synthesis route is found under two conditions: (1) post-run filtering of routes based on the reduced building block dataset and (2) rerunning the entire search algorithm on the reduced building block set from the start (100 ChEMBL molecule results from Experiment 2). This comparison highlights the importance of algorithm robustness to changes in the search environment. Across all search algorithms, rerunning the search on the changed environment improves the success rate by 20% to 50% in total. This improvement underscores the importance of online search algorithms in synthesis planning, which can adapt to the available building block space rather than relying on a static set of building blocks.
Following this analysis, we investigate which combinations of algorithms cover the largest unique building block space when aggregated across all molecules (see Fig. 6) using an UpSet plot.38 For our 100 ChEMBL molecules, the total unique building block space is 28
260. The general trend shows a logarithmic relationship: combining more algorithms increases total coverage of the building block space, but with diminishing returns for each additional algorithm when selecting the best-performing ones, which aligns with their previously discussed per-molecule coverage and uniqueness rates of different algorithms. Individually, the highest coverage is provided by the Distance and standard MCTS algorithms, each covering roughly 57% of all unique building blocks. In contrast, the clustered methods cover a much smaller portion of the space. Combining any two search algorithms, with the exception of Cluster-MCTS, greatly increases unique building block coverage to approximately 70% to 77%. A combination of three such algorithms (excluding Cluster-MCTS) already accounts for 83% to 90% of the total building block space. Finally, complete coverage of the 28
260 building blocks is achieved only when combining results from nearly all algorithms. This finding is consistent with our earlier observation that, while most algorithms overlap, many still contribute a small percentage of unique building blocks.
As an additional case study, we applied our diversity search methodology to two products (A and B) previously used to evaluate guidance by an external literature database.31 Notably, we conducted this evaluation without relying on a proprietary route database or closed-source single-step retrosynthesis model, as these are publicly unavailable. Consistent with the ChEMBL100 pattern, the maximum unique building block space discovered generally follows a logarithmic scale, reaching saturation when different algorithms are combined (see SI Fig. S6 and S7). Quantitatively, we found between 500 and 1300 routes with unique building blocks for each algorithm for Product A. In contrast, Product B proved more challenging, as Distance-MCTS yielded 14 routes with unique building blocks, while Cluster-Retro* failed to find any route (see SI Table S5). Qualitatively, we conceptually reproduced the reported route for Product A using Distance-MCTS (see Fig. SI S5). However, our route is one step shorter, having eliminated a protection step by identifying better building blocks in our synthesis space exploration, a result that previously required literature guidance.31 While our approach identifies the same conceptual reaction steps, it utilizes different building blocks. For Product B, the route we identified corresponds conceptually to the slightly longer, non-literature-guided synthesis route, albeit with a premature deprotection step. Here, none of our synthesis routes utilize a [3 + 2] cycloaddition ring-forming reaction identified by literature guidance.31 This is surprising given that the building blocks required for the cycloaddition route are present in our building block set. Considering that online search algorithms achieved success rates comparable to self-play methods in solvable cases (see Fig. 2) and that coverage of the unique solution building block space generally saturates after combining different algorithms (see Fig. 6), our inability to find a route utilizing the cycloaddition is unexpected. This strongly suggests that the template-based single-step model failed to predict the specific disconnection required, highlighting the importance of valid reaction predictions from the single-step model as the limiting factor, rather than the search algorithm's coverage of the synthesis route solution space.
In summary, all algorithms return numerous potential synthesis routes, where the building block solution space can be improved when combining different algorithm results. Nevertheless, cluster-based approaches find a more manageable solution space at the cost of potentially missing a shorter route, as not all analog building blocks are available during the search. The breadth of synthesis route solutions, potentially combining results from different algorithms, enables subsequent route optimization based on desired objectives (e.g. cost, availability, safety), ensuring that chemists can choose the optimal route that meets their specific project requirements from among many route alternatives.
Our results show that the synthesis planning search problem of successfully finding a synthesis route may be computationally more straightforward than previously expected, as the template-based single-step model tends to suggest only a limited number of distinct reaction disconnection ideas and consequently reducing the search space greatly from the assumed 50treedepth1 to roughly 3treedepth, when clustering, or 25treedepth, when not clustering. We also demonstrated that the main driver for success in synthesis planning is the number of single-step model calls an algorithm can use. Specifically, we observed a logarithmic relationship between increasing the number of single-step calls and the fraction of molecules for which a valid route is found. Beyond a certain threshold of single-step calls, the choice of search algorithm becomes less critical, as most methods eventually discover a possible synthesis route, indicating a natural performance plateau that all search algorithms reach over time. To our surprise, online estimation of Q(s, a) and V(m) performed competitively with state-of-the-art self-play variants.22 Those self-play methods mainly gain faster inference (in terms of single-step calls) but do not achieve better synthesis planning performance. This finding deviates from what is observed in the game of Go, which inspired most self-play variants in synthesis planning, where self-play dramatically boosts performance compared to an online search algorithm.16 Unlike Go, synthesis planning is not a stationary environment with fixed rules and unchanging winning states. We show that self-play implicitly assumes a stationary environment to function well in synthesis planning.
Nevertheless, our findings do not render self-play variants entirely ineffective. Self-play can be advantageous if the environment is effectively fixed, for example, when using a never-changing set of public reactions like USPTO39 or a small set of fixed in-house building blocks.4 It might also be required when a fast synthesis planning result is necessary, such as when synthesis planning is used as an objective in de novo drug design.4,5 However, the substantial GPU resources required by self-play raise questions about its practicality for more general applications. From a purely practical standpoint, there is little difference between finding a solution in 500 or 25
000 iterations using only CPU infrastructure as long as the total inference time is under an hour. Furthermore, fixing the environment offers fewer benefits when applying synthesis planning to novel targets, reactions, or building blocks.
We also observe that benchmarking synthesis planning experiments exclusively on the USPTO-190 dataset can be problematic. Because USPTO-190 is in-domain for the publicly available USPTO-based single-step model, it requires no substantial search effort beyond a best-first search. In contrast, we show that planning performance on more “unknown” enumerated spaces (e.g. GDB17-1000) can be surprisingly poor, raising concerns about out-of-distribution performance for truly novel targets. We want to highlight this divergence between well-known chemical spaces (e.g. USPTO-190, ChEMBL-1000) and more unexplored ones (e.g. GDB17-1000), where current methods perform remarkably worse.
Regarding the robustness of synthesis planning, it is notable how strongly synthesis planning depends on the availability of key building blocks. We observe remarkable declines in route planning success when few key building blocks are missing, even though millions of alternative building blocks are still available. Furthermore, no alternative disconnection reaction can be found if the disconnection idea in the best-first route is unavailable. This lack of alternative reactions is problematic because a perfect synthesis planning algorithm would provide chemists with a structurally diverse set of synthesis routes, allowing them to choose the appropriate ones based on project requirements. Here, the main advantage of our clustering and diversity-oriented search approaches (Cluster-Retro*-0, Cluster-MCTS, Distance-MCTS) is that they decrease the synthesis planning solution space by directly operating in reaction disconnection space and potentially exploring a wider range of reaction ideas. Such algorithms can reduce the overall solution space by providing diverse synthesis routes for chemists to choose from, rather than slight variations of the same core building blocks when returning all found synthesis routes.
Based on our insights, we recommend (i) a best-first search with a large number of iterations (e.g. 25
000) for well-known chemical spaces (e.g. USPTO-190, ChEMBL-1000) and (ii) an MCTS variant in other cases as synthesis planning search settings. In both cases, an hour of search time should suffice, provided single-step model inference is fast enough. For maximum coverage of the synthesis route solution space, we recommend combining the results of multiple search algorithms on the molecule-specific search problem defined by the project's available building blocks, rather than relying on a fixed general search environment with post-run route filtering.
Naturally, our work has certain limitations. First, we emphasize that we used the default search algorithm benchmark with a fixed, template-based single-step model. Different single-step models27,29 could yield different reaction distributions, potentially altering the results, by using either a template-free11,40,41 or an ensemble of different single-step models.42 Here, especially Transformer-based methods could increase the diversity of the effective branching factor of the single-step model (e.g. ref. 31), which are typically limited by their beam size of 10 to 20 alternatives candidate reactions.41 Second, we only evaluate synthesis planning search algorithms on their route-finding ability, but do not evaluate follow-up questions regarding the quality of the produced routes. Instead, we mainly focus on the search success problem, as it is a requirement for further route optimization. We treat the single-step model as a benchmark-defined black-box to allow comparability with other search approaches under the same evaluation environment. However, it might be interesting to change the evaluation environment by altering the single-step model to return only reactions that satisfy the round-trip prediction by ensuring the correct product is predicted given the retrosynthesis reactants (e.g. ref. 8 and 31) or to evaluate the found routes for different algorithms with a chemist to verify their overall validity as algorithmically found synthesis routes must ultimately work in real-world laboratory. Third, our reaction clustering remains an approximation of reaction diversity as we rely on heuristic reaction fingerprints instead of using the underlying reaction mechanisms. Future work could improve reaction representations by testing alternative reaction fingerprints43,44 or clustering methods and their parameterization to enhance our approach further. Our results suggest that key advances in the synthesis planning field will originate from improving the single-step model, particularly in generating a more diverse set of reaction disconnections. Such diversity would expand the overall reaction search space and enable more innovative synthesis route solutions.
Supplementary information (SI): algorithm configuration parameters, search performance statistics, reaction clustering examples, and further case study information. See DOI: https://doi.org/10.1039/d5dd00280j.
| This journal is © The Royal Society of Chemistry 2026 |