Open Access Article
Guzel Minibaeva
a,
Haolin Du
b,
Finlay Clark
c,
Julien Michel
b and
Pavel Polishchuk
*a
aInstitute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, Palacky University, Hněvotínská 1333/5, 779 00 Olomouc, Czech Republic. E-mail: pavlo.polishchuk@upol.cz
bEaStCHEM School of Chemistry, University of Edinburgh, Edinburgh, EH9 3FJ, UK
cSchool of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, NE1 7RU, UK
First published on 28th April 2026
The de novo generation of chemical compounds represents a compelling strategy for the exploration of a significantly broader chemical space compared to traditional virtual screening methods. However, fragment-based approaches often encounter challenges related to the low synthetic accessibility of generated structures. In this study, we integrated the previously established fragment-based structure generator, CReM, with molecular docking techniques implemented in EasyDock to facilitate the exploration of chemical space. This novel methodology enables indirect control over the synthetic accessibility and diversity of the generated structures, enhances an objective function to yield compounds with desirable physicochemical properties, and maintains key protein–ligand interactions and ligand conformations. The structures generated through this approach exhibited a high degree of novelty and demonstrated competitiveness with those produced by leading methodologies in the field, as demonstrated by Absolute Binding Free Energy calculations. We illustrated the versatility of the developed approach through several case studies, showcasing its applicability to both de novo generation and fragment expansion tasks. The tool developed in this study is open-source and can be accessed at https://github.com/ci-lab-cz/crem-dock.
Numerous approaches combine de novo structure generation with molecular docking to identify novel bioactive compounds. Reaction-based methodologies enumerate molecules using predefined reaction rules to improve synthetic plausibility. For example, AutoCouple19 generated ∼70
000 ligands for the CBP bromodomain from three reactions, leading to 53 synthesized compounds and a highly potent and selective inhibitor (IC50 = 35 nM). Similarly, Chevillard et al.20 expanded fragment hits for the β2-adrenergic receptor via reaction-based linking, yielding compounds with up to 40-fold affinity improvements. Other tools such as NAOMInext21 and AutoGrow4
22 iteratively grow or recombine molecules using reaction rules and docking-based evaluation, with AutoGrow4 demonstrating docking scores exceeding those of known ligands. However, while reaction-based approaches are intended to enhance the likelihood of synthetic accessibility of the designed structures, they do not guarantee it. For example, the top-scoring structures generated by AutoGrow4 have been found to lack synthetic feasibility.22 Another limitation pertains to the coverage of chemical space, which depends on the selected reaction rules and the libraries of building blocks.
Fragment-based methods offer broader chemical space exploration but often at the cost of reduced synthetic accessibility. Strategies to mitigate this include post hoc filtering or explicit biasing during optimization. LigBuilder23 applies retrosynthetic filtering after fragment growth, whereas Graph-GA24 integrates a synthetic accessibility (SA)25 score directly into the objective function, markedly increasing the proportion of synthetically feasible top-ranked compounds. Other tools, such as OpenGrowth26 and FragExplorer,27 introduce implicit synthetic bias by restricting allowable fragment connections based on observed chemical patterns. While these constraints improve chemical plausibility, they provide limited control and do not consistently yield drug-like or synthetically accessible molecules.
Machine learning-based generative models represent a third major category. OptiMol28 and SampleDock29 employ variational autoencoders guided by iterative docking, while DockStream30 provides a docking wrapper that integrates with recurrent neural network-based reinforcement learning frameworks such as REINVENT SBMolGen31 combines recurrent neural networks with Monte Carlo tree search and docking. These approaches can generate molecules with docking scores comparable to or exceeding those of known actives and often maintain reasonable drug-likeness and SA scores. Nevertheless, issues such as declining novelty, limited explicit control over synthetic feasibility, and high computational cost remain. Reaction-aware neural models, such as SynFlowNet32 and RGFN,33 further integrate docking and retrosynthesis signals but are computationally demanding and tend to generate highly aromatic structures with suboptimal drug-likeness and limited capacity for further optimization.
Overall, fragment-based and machine learning-driven approaches provide superior chemical space coverage relative to reaction-based methods, but synthetic feasibility remains a central challenge. Synthetic biasing can be introduced explicitly through objective functions or implicitly through structural constraints; however, explicit methods require careful scoring and weighting, while implicit strategies are often overly simplistic and lack fine-grained control.
Previously, we introduced CReM,34 a fragment-based framework that indirectly biases generation toward synthetically accessible molecules by restricting bond formation to chemical contexts observed in synthesized compounds. CReM supports fragment growth, mutation, and linking, and its synthetic relevance has been validated using SA scores and retrosynthetic analysis.35 Building on CReM, Moldrug36 was developed as a structure-based optimization tool for hit-to-lead design. Moldrug employs a genetic algorithm to optimize docking, drug-likeness, and synthetic accessibility. While Moldrug supports multi-objective optimization and interactive analysis, it is currently limited to AutoDock Vina and does not incorporate protein–ligand interaction fingerprints into its objective function; however, it may be extended with additional objectives.
In this study, we developed and implemented a de novo generation pipeline, named CReM-dock, which employs a fragment growing strategy informed by the CReM generative approach34 and is guided by molecular docking techniques integrated within the EasyDock module.37 The development of EasyDock was largely driven by the lack of accessible, fully automated open-source docking pipelines capable of seamless integration with third-party software. Consequently, its development proceeded concurrently with that of CReM-dock. The modular architecture of CReM-dock facilitates the modification and enhancement of individual components. While CReM offers fine-tuning control over the synthetic accessibility of the generated molecules,35 the integration of EasyDock provides a unified interface for a fully automated docking pipeline, encompassing ligand preparation and various molecular docking methodologies. This integration significantly streamlines the execution, processing, and analysis of docking results. Currently, EasyDock is compatible with AutoDock Vina,38 Smina,39 Gnina40 programs and other derivatives of Vina: QVina2,41 QVina-W42 and their GPU implementations.43 Additional program support can be incorporated into EasyDock, thereby making them readily accessible to CReM-dock.
Current de novo design pipelines guided by molecular docking often incorporate proprietary components, limiting their accessibility to the broader research community. Among these, protonation tools are the most commonly used proprietary modules within existing docking workflows. While some pipelines neglect ligand protonation altogether, others employ OpenBabel, which utilizes a simple rule-based approach. To address these limitations, we have integrated multiple open-source protonation tools into EasyDock and validated them. Notably, the protonation tool based on the Uni-pKa model44 exhibited the highest agreement with the proprietary software from Chemaxon. The incorporation of open-source protonation tools has made the entire CReM-dock pipeline fully open-source and freely accessible.
CReM-dock is capable of addressing two distinct scenarios: (i) fragment expansion or scaffold decoration, and (ii) de novo design. In the first scenario, the generation process commences from a defined pose of a seed molecule or fragment, which may be derived from the X-ray structure of a protein–ligand complex or obtained through alternative computational modelling techniques, such as molecular dynamics simulations. In the second scenario, the initial fragments can be sourced from any existing library of fragments or building blocks. These fragments undergo sequential expansion, with the most promising candidates advancing to subsequent iterations. A notable feature of CReM-dock is its ability to specify important protein–ligand interactions that must be preserved throughout the generation process, as well as to maintain the coordinates of the seed fragment in the resulting structures, which is particularly pertinent in the context of the fragment expansion scenario.
There are two distinct modes of structure generation: de novo design and fragment/molecule expansion. The de novo design mode requires a set of initial fragments provided in the form of SMILES or 2D structures. The fragment/molecule expansion mode requires 3D structures as input, corresponding to either empirically observed or predicted binding conformations. The primary distinction between these modes lies in the fact that 3D input molecules are directly subjected to the growth phase, thereby bypassing the docking and selection stages.
The implemented generative pipeline consists exclusively of open-source modules and can therefore be used without additional restrictions. This eliminates limitations commonly associated with the ligand-preparation step, for which proprietary software is often employed for tasks such as protonation-state assignment.
A major advance in the development of CReM-dock was the incorporation of several open-source tools for ligand protonation, thereby rendering the entire design pipeline fully open-source. Specifically, we integrated and evaluated the MolGpKa45 and Uni-pKa44 models, comparing their predicted protonation states at pH 7.4 against those generated by the Chemaxon cxcalc utility. This comparison was conducted using molecular structures from the ChEMBL database, restricted to compounds with molecular weights below 500 Da. The concordance between Chemaxon predictions and those of MolGpKa, Uni-pKa, and OpenBabel was 78%, 82%, and 70%, respectively. Based on these results, the Uni-pKa protonation tool was selected for all subsequent experiments in this study.
Additionally, we implemented an augmentation involving the fraction of sp3 carbon atoms within Bemis–Murcko scaffolds (Csp3BM). The Csp3BM values, which range from 0 to 0.3, were linearly scaled to a range of 0 to 1, while values above 0.3 were capped at 1. The scaled Csp3BM values were squared to increase the selection pressure associated with this parameter and subsequently multiplied by the previously normalized docking score. The threshold of 0.3 was chosen based on recommendations from the authors of the initial CACHE challenge,47 in which our participation resulted in one of top-ranking positions.48 Other objective functions were also considered, such as those based on docking efficiency (calculated as the docking score divided by the number of heavy atoms); however, these were not utilized in the studies presented herein.
Additional selection protocols were also implemented, but they were not used in the present study because they did not provide a clear advantage. One such approach is based on Pareto ranking with respect to docking and SA scores; however, in test runs this strategy led to a substantial decrease in docking scores while it did not improve SA scores sufficiently to outperform other selection strategies. Another approach clusters structures according to their scaffolds, rather than whole structures. Although this was expected to increase scaffold diversity in the output set, only a minor effect was observed experimentally.
Users have the option to specify additional criteria prior to the actual selection process. One such criterion is the root-mean-square deviation (RMSD) threshold, which serves to maintain the pose of a parent structure in its derivative structures. If the docking poses of the generated compounds differ from their parent molecule by more than the established RMSD threshold, those compounds will be excluded from consideration. The RMSD value is computed based on the heavy atoms of the maximum common substructure between the parent and successor molecules. This approach is particularly advantageous in fragment expansion studies, where the conformation of the initial fragment is determined experimentally, and it is anticipated that the successor structures will retain this conformation.
Additionally, users may provide a list of important protein–ligand interactions and establish a minimum similarity threshold based on protein–ligand interaction fingerprints. These fingerprints, calculated using ProLIF,49 encompass hydrogen bond donors and acceptors, hydrophobic interactions, aromatic contacts, and charged interactions, as well as metal coordination sites. Compounds that do not meet the specified similarity threshold are excluded from the selection process. For instance, if three contacts are specified and a threshold of 0.5 is set, this implies that at least two of these contacts must be present in the candidate structures (resulting in a similarity score of 0.66); otherwise, they will not qualify for selection. This methodology is particularly beneficial when important contacts are known in advance, ensuring that the designed molecules retain these essential interactions.
During the growth process, only hydrogen atoms that are situated at least 2 Å away from any heavy atom of the protein are replaced. This precaution is taken to prevent growth in spatial directions that lack sufficient volume to accommodate larger fragments. To regulate the number of generated compounds and enhance the predictability of runtime, a maximum number of randomly selected replacements can be specified. In the absence of such a specification, all potential replacements will be executed, which may result in a substantial number of structures, particularly for smaller context radii. Our findings suggest that employing 1000 random replacements yields satisfactory results, and this value has been consistently utilized in all reported studies. Furthermore, the size of the steps within chemical space can be adjusted; by default, fragments containing up to 10 heavy atoms are attached. Given that CReM cannot create new ring systems, it is essential to select a sufficiently large fragment size to allow for the incorporation of rings into the molecules. The ten-atom limit permits the addition of bicyclic fragments that include five- or six-membered rings.
By default, the fragment selection process in CReM utilizes a uniform distribution. However, to impose selective pressure and prioritize fragments with more favourable properties, we incorporated customizable selection functions within CReM, enabling fragment selection based on user-defined criteria. In one of the studies presented in this work, fragment selection was weighted in proportion to the square of the fraction of sp3 carbon atoms within the fragments. This approach is intended to preferentially select fragments with a higher fraction of saturated carbon atoms, thereby enriching the generated structures in sp3 carbon atoms in their frameworks.
We have implemented controls over important physicochemical properties that influence drug-likeness, including molecular weight (MW), topological polar surface area (TPSA), lipophilicity (log
P), and the number of rotatable bonds (RTB), to ensure that the generation process predominantly yields drug-like molecules. All these parameters, except for lipophilicity, either increase or remain constant as the number of atoms in a molecule increases and are primarily additive in nature. If any of these parameters reaches or exceed a predetermined threshold, the corresponding molecule is excluded from further consideration. Due to the predominantly additive nature of these properties, we can pre-filter CReM fragments in real time, selecting those that are unlikely to produce structures that exceed the predefined thresholds for physicochemical properties. This strategy effectively circumvents the need to enumerate and dock of structures with undesirable properties.
In all studies described below, specific constraints were imposed on the physicochemical properties of the generated structures. These constraints included a molecular mass not exceeding 450 Da, a maximum of five rotatable bonds, a lipophilicity value of no more than 4, and a topological polar surface area limited to 120 Å2. These parameters are consistent with Lipinski's rule and allow for potential further optimization of the generated structures. The deliberate restriction on the number of rotatable bonds is intended to reduce the flexibility of the designed structures, thereby improving the accuracy of docking simulations.50,51
000 compounds from ChEMBL with a molecular weight of 500 Da or less and computed pairwise similarities among them. The results indicated a mean similarity of 0.105, with the 95th percentile at 0.178 and the 99th percentile at 0.230.
000 random compounds with a molecular weight of 500 Da or less from ChEMBL33. Additionally, we compiled distinct sets of active compounds for each individual target. These active compounds were selected from those evaluated in a single protein assay format and exhibited pIC50, pKi, or pKd values of 6 or higher. For the purpose of visualizing the chemical space, we employed UMAP57 (utilizing the umap-learn Python package) alongside 2048-bit Morgan fingerprints with a radius of 3. All parameters were maintained at their default settings, except for the number of neighbours, which was set to 10, and the metric, which was specified as “jaccard”.
Subsequent solvation, energy minimization, and pre-equilibration steps were performed automatically within A3FE. Specifically, the bound complexes and free systems were solvated with TIP3P water molecules in a 0.15 M NaCl solution in rhombic dodecahedral boxes, with a minimum distance of 15 Å maintained between the solute(s) and the box edge. All simulations were conducted using GROMACS 2024.4-foss-2023b-CUDA-12.4.0 via BioSimSpace (version 2024.4.0).62 Energy minimization was performed for 1000 steps, followed by several stages of NVT equilibration: 5 ps with all nonsolvent atoms positionally restrained while heating from 0 to 298 K, 50 ps with position restraints applied to all backbone atoms for complexes only, and finally a further 50 ps with no restraints. Subsequently, NPT equilibration was carried out at 1 atm and 298 K for a duration of 50 ps with positional restraints on non-solvent heavy atoms, then followed by a 50 ps simulation without positional restraints. For each system, independent 100 ps NPT equilibration runs were performed for every replicate to generate diverse starting conformations. A total of 5 replicates per system were generated. All positional restraints employed a force constant of 10 kcal mol−1 Å−2. The V-rescale and C-rescale algorithms were used for temperature and pressure coupling, with coupling constants of 1 ps and 4 ps, respectively.63,64 All equilibration molecular dynamics simulations were performed with a 2 fs time step and an 10 Å cutoff for short-range interactions, with long-range electrostatics calculated using the PME algorithm.65,66 After equilibration, the trajectory of the first replicate was analysed with BioSimSpace to determine the optimal Boresch restraints. The BioSimSpace restraints generator algorithm evaluates fluctuations for selecting the stable anchor points (three atoms in receptor and three atoms in ligand) for restraining the configurational volume.67 This approach emulated the effect of native interactions by ensuring the ligand remaining localized within the binding site upon decoupling. To ensure setup consistency across replicates, the same restraints were applied across all replicate simulations. Different starting conformations for each leg of every repeat were obtained from the last frame of 100 ps equilibration.
Simulations were carried out with five replicates, each initiated with independently assigned starting velocities and starting coordinates from the equilibration stage. MBAR was used to estimate free energy changes.71,72 Free energies were reported as the mean of the five replicates. Statistical uncertainties were obtained from the 95% confidence interval, assuming Gaussian distribution and employing values to four degrees of freedom. Statistical significance was evaluated by using Student's t-test. The total sampling time allocated to each ligand was 260 ns.
The subsequent subsection presents a de novo investigation utilizing CDK2 as the target protein. This analysis examines the influence of various parameters on the docking and SA scores of the generated structures, as well as their novelty and the reproducibility of results across independent runs. From these investigations, we identified optimal settings that were subsequently employed in all further experiments. Additional experiments focused on: (i) the impact of different selection strategies on the diversity of the generated structures; (ii) methods that may skew generation towards structures enriched with sp3 carbon atoms. Ultimately, we compared the results of the CReM-dock pipeline with those of traditional virtual screening methods.
In the third subsection, we demonstrate the consistency in the quality of the generated structures across various proteins from different families, utilizing the previously established settings, and compare of these results with the outputs of state-of-the-art approaches. A separate section is devoted to the analysis of calculated absolute binding free energies for the top-scoring structures obtained using different approaches, in order to demonstrate the utility of the developed pipeline in comparison with state-of-the-art approaches. The final subsection presents a fragment expansion study aimed at retrospectively assessing the capability to generate structures that are identical or highly similar to those previously discovered, starting from a small fragment co-crystallized with a protein.
554
260 structures. Subsequently, we refined this subset by excluding molecules that matched to at least one structural alert from the BMS, Dundee, Glaxo, Inpharmatica, and PAINS filters, as implemented by Pat Walters.75 Our previous research has demonstrated that the exclusion of such molecules prior to the creation of the fragment database ensures the generation of molecules devoid of these patterns, provided that the size of the alert patterns does not exceed the designated context radius.34 This process reduced the dataset to 818
174 molecules. Furthermore, we identified subsets of molecules characterized by restricted synthetic accessibility (SA) values, as predicted by the methodology of Ertl and Schuffenhauer.25 We established thresholds of 2 and 2.5, which are below the average (3.0) and the median SA score (2.73) for all compounds in ChEMBL22. This resulted in subsets containing 67
970 and 338
422 molecules, respectively. These sets were then subjected to exhaustive fragmentation and subsequently converted into CReM fragment databases (Table 1).
| CReM DB | n (fragmented molecules) | n (distinct fragments) | Number of distinct fragment/context pairs for each radius | ||||
|---|---|---|---|---|---|---|---|
| Radius 1 | Radius 2 | Radius 3 | Radius 4 | Radius 5 | |||
| ChEMBL | 818 174 |
988 585 |
2 263 436 |
4 051 790 |
7 133 534 |
11 007 247 |
15 271 543 |
| ChEMBL SA2.5 (SA ≤ 2.5) | 338 422 |
272 988 |
671 140 |
1 263 268 |
2 319 377 |
3 752 375 |
5 419 544 |
| ChEMBL SA2 (SA ≤ 2) | 67 970 |
55 498 |
143 434 |
267 156 |
472 126 |
754 905 |
1 087 492 |
970 ChEMBL molecules that exhibited a SA score of less than 2. All attachment points were capped with hydrogen atoms, and the resulting molecules were subsequently converted into canonical SMILES format, with duplicates removed in the process. Following this, we applied a filtering process based on the following physicochemical properties of the resulting 200
000 fragments, which yielded a final set of 20
164 structures:• The number of heavy atoms was within the range 8–15.
• The number of distinct H-bond donors and acceptors was within the range 1–5. If an atom was labeled as an H-bond donor and an acceptor it was counted only once. This gives an estimate on the number of specific contacts.
• The number of rings was 1–3.
• The number of fused ring systems was 0–2.
• The number of rotatable bonds was 0–2.
• Lipophilicity was less than 2.
• Topological polar surface area (TPSA) was greater than 25 Å2
• The total number of halogen atoms (Cl, Br, and I) was 0–1.
• The maximum size of rings was 7.
For this subset, we enumerated all stereoisomers utilizing an RDKit script sourced from the repository,76 as well as tautomers using the cxcalc utility from Chemaxon.77 A check for duplicates was conducted, yielding a final count of 23
840 structures, which were designated as starting fragments for de novo generation.
The chemical environment for each non-equivalent hydrogen atom in the initial molecular fragment was identified utilizing CReM. We analyzed all possible combinations of four hydrogen atoms. However, by restricting the analysis to non-equivalent hydrogens, we underestimated the potential number of derivatives, as this approach excludes scenarios where substituents are attached to the same methyl group, for example. This simplification is a consequence of the constraints inherent in the current implementation of CReM.
For each combination of four hydrogens, the total number of potentially enumerated compounds was calculated as the product of the number of substituents available at each hydrogen position, adhering to the constraint of a maximum total number of heavy atoms in the generated structures. The overall number of molecules was determined by aggregating the number of enumerated structures across all combinations of four substituents. To improve computational efficiency, calculations were conducted on a subset of 1000 randomly selected molecules, and the final estimate was extrapolated by applying a scaling factor of 23.84 (with a total of 23
840 starting fragments). This estimate was predicated on the assumption that substituents would only be attached to the initial starting fragment.
However, it is important to note that substituents can also be attached to previously introduced substituents. To account for this, the molecular structure was considered as a tree, where the five fragments (the starting fragment and four substituents) represent nodes. The possible number of connection combinations between these nodes was estimated using Cayley's formula, nn−2, which indicates that five nodes can be interconnected in 125 distinct configurations. This methodology introduces an additional simplification, resulting in an overestimation of the number of derivatives, as not all linkage combinations are chemically feasible due to chemical context constraints. These approximations may offset one another, yielding a rough estimate of the order of magnitude for the size of the chemical space covered.
A radius of 3 was selected as it represents a sufficiently large default value to produce synthetically feasible structures. For the largest CReM fragment database, the estimated coverage of chemical space was 1017 (Table 2). This coverage diminished when utilizing more restricted fragment databases: 1016 structures for ChEMBL SA2.5 and 1015 for ChEMBL SA2. For the smallest fragment database (ChEMBL SA2), the chemical space size was also assessed for radii of 2 and 4, resulting in predictable variations. The coverage increased to 1016 structures for a radius of 2 and decreased to 1013 for a radius of 4 (Table 2). These results suggest that even under highly constrained conditions, the covered chemical space remains considerable.
| CReM DB | Radius | Estimated size of covered chemical space |
|---|---|---|
| ChEMBL | 3 | 2.8 × 1017 |
| ChEMBL SA2.5 | 3 | 4.2 × 1016 |
| ChEMBL SA2 | 3 | 1.8 × 1015 |
| ChEMBL SA2 | 2 | 8.4 × 1016 |
| ChEMBL SA2 | 4 | 2.7 × 1013 |
Furthermore, we assessed the diversity of the top 100 generated structures based on Murcko scaffolds. The average number of distinct Murcko scaffolds was relatively low (ranging from 16 to 24) for smaller context radii of 1 to 3, irrespective of the fragment database (Fig. S2b). This observation suggests that under these conditions, certain scaffolds may yield multiple successful derivatives that outperform others. Conversely, for larger context radii of 4 and 5, the average number of distinct scaffolds increased significantly, ranging from 35 to 57 out of a maximum of 100. The impact of larger radii was particularly pronounced when utilizing more restricted fragment databases. This phenomenon can be attributed to the limited number of expansions available for each structure during each iteration, resulting in a reduced number of successor molecules generated per scaffold, thereby decreasing the likelihood of any single scaffold being overrepresented and outperforming others. Despite the increased diversity of scaffolds observed for context radii of 4 and 5, these scaffolds were often reproducible across different fragment databases (Fig. S3).
Two near-optimal configurations were identified: the use of the SA2 fragment database with a radius of 2 and the SA2.5 fragment database with a radius of 3. Both settings yielded high docking scores, ranging from −12.5 to −12.1, together with favorable SA scores, ranging from 2.8 to 2.95. These configurations also demonstrated comparable diversity in terms of distinct structures and scaffolds (Fig. S2), as well as novelty of the generated compounds (Fig. 3). Given the minimal differences between these conditions, the former configuration – the SA2 fragment database with a radius of 2 – was arbitrarily selected for all subsequent experiments.
As anticipated, the application of the greedy selection strategy resulted in partially reproducible molecular structures across independent runs, characterized by low scaffold diversity (Fig. S4b). The clustering selection exhibited moderate scaffold diversity, while the Pareto selection demonstrated the highest scaffold diversity. Similarly to previous strategies, there was partial reproducibility of generated scaffolds within top 100 structures (Fig. S4b).
The percentage of generated structures that bind to the hinge region was highest for the Pareto strategy (19–25%), followed by the greedy approach (19–20%) and clustering (15–17%). The number of generated and docked structures varied significantly across the different strategies. The clustering strategy consistently produced the lowest number of structures (87
000–89 000). In contrast, the greedy strategy generated between 90
000 and 92
000 molecules, while the Pareto strategy yielded substantially more structures, ranging from 108
000 to 148
000 (Table S1).
While the Pareto strategy offers the highest diversity of scaffolds among the generated compounds, its runtime is less predictable compared to the other strategies due to the variable number of compounds selected in each iteration and the potential for a greater number of iterations. The greedy search, on the other hand, results in partially reproducible outputs with compounds exhibiting high docking scores, thus negating the necessity for multiple runs. However, the diversity of the generated structures remains relatively low. The clustering strategy presents a balanced approach, offering predictable runtime and reasonable diversity.
All simulations were conducted in a single replicate. The initial strategy involved augmentation of the docking score by the fraction of sp3 carbon atoms in Bemis–Murcko scaffolds (Csp3BM). This method resulted in only a marginal increase in the proportion of generated molecules that met the specified criteria (Fig. 4a). A more effective strategy involved using a custom sampling function that selected fragments from the CReM database in proportion to the squared fraction of sp3 carbon atoms. This increased the proportion of compounds with the Csp3BM fraction ≥ 0.3 on 15–20%.
The most significant improvement was achieved by pre-filtering the starting fragments to retain only those with Csp3BM values of 0.3 or higher. This filtering reduced the number of starting fragments from 23
840 (SA2 fragments) to 2851 (SA2 Csp3-rich fragments), while markedly increasing the proportion of generated molecules with Csp3BM ≥ 0.3, rising from ∼15% to ∼57% (Fig. 4a). Additionally, this filtering led to increased synthetic accessibility scores for the top-scoring compounds that satisfied PLIP and had Csp3BM ≥ 0.3, while maintaining docking scores comparable to or better than those obtained using the complete fragment set (Fig. 4b).
To further explore this phenomenon, an alternative set of starting fragments was prepared following the same protocol described in the Methods section, utilizing the CReM SA2.5 database. This approach yielded 27
802 starting fragments enriched in sp3 carbon atoms (SA2.5 Csp3-rich fragments). Utilizing this extended initial set, a similarly high proportion of compounds meeting the desired criteria (55–77%) was achieved. The top-scoring molecules from this dataset exhibited slightly improved docking scores and increased SA scores. Consequently, the proportion of structures containing a substantial fraction of sp3 carbon atoms appears to be largely independent of the initial fragment set size, but is instead influenced by its composition. Examples of these top-ranked molecules are presented in Table S2.
In summary, the commonly employed strategy of augmenting docking scores with additional parameters to guide molecular generation towards a specific region of chemical space proved to be ineffective in this context, likely due to the necessity for fine-tuning the augmented objective function. A more effective approach involved direct control over the composition of starting fragments and the fragments utilized during the molecular growth process.
000) was somewhat greater than the average number of compounds docked during the de novo generation process utilizing the ChEMBL SA2 fragment database with a radius of 2. This selection was undertaken to evaluate the efficacy of virtual screening relative to our de novo design pipeline, employing comparable computational resources for both methodologies. For comparison, we additionally conducted de novo generation using the previously established settings, but without PLIP guidance. This approach aimed to provide a clearer understanding of the relationship between virtual screening and de novo generation.Notably, only 617 ZINC compounds (0.51%) were able to establish the required interactions with the hinge region, a proportion lower than that observed among de novo generated compounds, for which 16.8% and 5.4% of structures were predicted to bind the hinge region in the PLIP-guided and non-guided runs, respectively. Furthermore, the docking scores of the highest-ranked compounds from the ZINC database were inferior to those of the de novo generated molecules, regardless of the compounds' ability to bind to the hinge region (Fig. 5). This indicates that de novo generation techniques can yield superior docking scores compared to traditional virtual screening methods when comparable computational resources are employed. Additionally, de novo generation, regardless of biasing towards favorable interactions, demonstrated a greater success rate in identifying structures that conform to a specified protein–ligand interaction pattern compared to the conventional screening approach.
To adjust REINVENT and ChemTS to the settings used by CReM-dock we applied several modifications to their pipelines. We fine-tuned the REINVENT model pre-trained on ChEMBL data to generate structures satisfying physicochemical criteria (MW ≤ 450, log
P ≤ 4, RTB ≤ 5, TPSA ≤ 120) and containing no structural alerts from the BMS, Dundee, Glaxo, Inpharmatica, and PAINS filter lists. In the second stage, structures were generated to satisfy the criteria above and were additionally biased by docking scores from AutoDock Vina38 and SA scores (the upper bound of a sigmoid transformation function was set to 3.5). We activated the Murcko scaffold diversity filter recommended by the authors of REINVENT, which is intended to explicitly manage diversity of generated structures and maintain it at a high level. To adjust ChemTS, we applied strict filters by physicochemical properties (MW ≤ 450, log
P ≤ 4, RTB ≤ 5, TPSA ≤ 120), structural alerts, and SA score (≤3.5), and generation was guided by AutoDock Vina docking score. The default docking protocols of REINVENT and ChemTS differed from that of CReM-dock. REINVENT required proprietary protonation modules, while ChemTS pipeline did not include a protonation step. Therefore, we incorporated protonation with the Uni-pKa model into REINVENT and updated Vina to version 1.2.7 used in CReM-dock. In ChemTS, we integrated the EasyDock pipeline including protonation with Uni-pKa and docking with Vina 1.2.7. Exhaustiveness was set to 8 for all programs.
The targets selected for this study belonged to different protein families: CDK2 (PDB 2BTR, kinase), BACE1 (PDB 6UWP, protease), DRD2 (PDB 6CM4, GPCR), ESR1 (PDB 8DV7, nuclear receptor), HDAC2 (PDB 7ZZT, epigenetic regulator), and PARP1 (PDB 7ONT, transferase) (Table S3). For CReM-dock, we used the same settings as in the previous experiments (SA2 fragment database and context radius of 2). Since both REINVENT and ChemTS do not incorporate biasing of structure generation by protein–ligand interactions, we ran CReM-dock without PLIP consideration. However, for the analysis we included results obtained previously, where PLIP was considered. For REINVENT and ChemTS, we ran a sufficient number of iterations to obtain approximately the same number of docking events as for CReM-dock.
The proportion of generated structures satisfying physicochemical criteria was high for both programs, REINVENT (∼95%) and ChemTS (∼85%). The number of structures for which important protein–ligand interactions were observed was low (Table 3). In most cases, fewer than 1% of structures exhibited such interactions. Only for HDAC2, where only a single interaction was identified as essential, the proportion of such structures was high. Conversely, CReM-dock runs, which were not guided by PLIP, demonstrated a much higher fraction of structures exhibiting important protein–ligand interactions, up to 9.5%, apart from HDAC2 target, for which 45% of ligands exhibited such interactions. We can only hypothesize that such a large number of structures with the expected PLIP were generated by CReM-dock owing to the iterative growing approach. While structures are smaller in earlier iterations, they have greater chances to establishing particular contacts within the binding site. This is supported by the observation that in early iterations, the fraction of generated structures satisfying given PLIPs is greater (Table S4). Therefore, even without direct guidance and explicit selection of structures having desired interactions, CReM-dock generated a higher number of structures satisfying the desired PLIPs than approaches generating complete structures in a single step, such as REINVENT or ChemTS. CReM-dock guided explicitly by PLIP, as expected, resulted in the highest number of compounds satisfying the desired interaction pattern among all tested approaches (Table 3).
| Protein | Program | n | n (physchem) | % | n (PLIP ≥ 0.6) | % | n (physchem & PLIP ≥ 0.6) | % |
|---|---|---|---|---|---|---|---|---|
| BACE1 | CReM-dock (with PLIP) | 80 077 |
80 077 |
100 | 7155 | 8.94 | 7155 | 8.94 |
| BACE1 | CReM-dock (w/o PLIP) | 94 134 |
94 134 |
100 | 418 | 0.44 | 418 | 0.44 |
| BACE1 | REINVENT | 95 022 |
89 612 |
94.31 | 17 | 0.02 | 16 | 0.02 |
| BACE1 | CHEMTS | 95 012 |
80 925 |
85.17 | 115 | 0.12 | 82 | 0.09 |
| CDK2 | CReM-dock (with PLIP) | 89 530 |
89 530 |
100 | 15 002 |
16.76 | 15 002 |
16.76 |
| CDK2 | CReM-dock (w/o PLIP) | 91 866 |
91 866 |
100 | 4911 | 5.35 | 4911 | 5.35 |
| CDK2 | REINVENT | 94 726 |
89 387 |
94.36 | 315 | 0.33 | 272 | 0.29 |
| CDK2 | CHEMTS | 91 779 |
85 259 |
92.9 | 967 | 1.05 | 928 | 1.01 |
| DRD2 | CReM-dock (with PLIP) | 99 433 |
99 433 |
100 | 24 743 |
24.88 | 24 743 |
24.88 |
| DRD2 | CReM-dock (w/o PLIP) | 97 342 |
97 342 |
100 | 4616 | 4.74 | 4616 | 4.74 |
| DRD2 | REINVENT | 98 429 |
92 939 |
94.42 | 0 | 0 | 0 | 0 |
| DRD2 | CHEMTS | 93 617 |
80 472 |
85.96 | 0 | 0 | 0 | 0 |
| ESR1 | CReM-dock (with PLIP) | 96 858 |
96 858 |
100 | 1824 | 1.88 | 1824 | 1.88 |
| ESR1 | CReM-dock (w/o PLIP) | 96 388 |
96 388 |
100 | 186 | 0.19 | 186 | 0.19 |
| ESR1 | REINVENT | 97 011 |
93 310 |
96.18 | 12 | 0.01 | 10 | 0.01 |
| ESR1 | CHEMTS | 94 607 |
81 962 |
86.63 | 110 | 0.12 | 104 | 0.11 |
| HDAC2 | CReM-dock (with PLIP) | 95 109 |
95 109 |
100 | 45 853 |
48.21 | 45 853 |
48.21 |
| HDAC2 | CReM-dock (w/o PLIP) | 95 458 |
95 458 |
100 | 43 478 |
45.55 | 43 478 |
45.55 |
| HDAC2 | REINVENT | 94 848 |
89 791 |
94.67 | 57 910 |
61.06 | 54 396 |
57.35 |
| HDAC2 | CHEMTS | 92 462 |
77 842 |
84.19 | 25 471 |
27.55 | 20 472 |
22.14 |
| PARP1 | CReM-dock (with PLIP) | 90 488 |
90 488 |
100 | 18 842 |
20.82 | 18 842 |
20.82 |
| PARP1 | CReM-dock (w/o PLIP) | 91 429 |
91 429 |
100 | 8652 | 9.46 | 8652 | 9.46 |
| PARP1 | REINVENT | 92 003 |
86 199 |
93.69 | 408 | 0.44 | 389 | 0.42 |
| PARP1 | CHEMTS | 94 681 |
82 262 |
86.88 | 848 | 0.9 | 768 | 0.81 |
For the analysis, we selected the top 100 structures by the highest docking score for each program. Additionally, we selected top 100 structures from corresponding subsets of compounds that satisfied physicochemical criteria only and both physicochemical and PLIP simultaneously (Fig. 6). Docking scores for ligands generated by CReM-dock were comparable to or better than those of REINVENT. Docking scores for ligands generated by ChemTS were systematically worse relative to both CReM-dock and REINVENT. When the PLIP criterion was applied for compound selection, docking scores of the top 100 compounds dropped for all programs, except CReM-dock generations explicitly guided by PLIP. In many cases, the generated structures outperformed known actives from ChEMBL in terms of docking scores.
SA scores of generated structures for all programs were at the acceptable level for different scenarios of compound selection, mainly below 3 (Fig. 6). Explicit biasing in REINVENT and ChemTS as well as implicit biasing in CReM-dock worked effectively. The only exception was DRD2 ligands generated by CReM-dock with the PLIP criterion, where the median SA score of the top 100 compounds was 3.6. However, these compounds had high docking scores, while REINVENT and ChemTS were unable to generate ligands capable of interacting with the aspartic acid residue, which is considered a canonical interaction for the dopamine D2 receptor and many other GPCRs.82,83
Structures generated by REINVENT and ChemTS exhibited greater similarity to compounds from ChEMBL, resulting in lower novelty compared to those generated by CReM-dock (Fig. S5). Although CReM-dock structures were composed of fragments from ChEMBL molecules, they demonstrated reduced overlap with the reference ChEMBL chemical space (Fig. S6). Interestingly, the top 100 compounds selected from REINVENT runs frequently clustered together on UMAPs (Fig. S6). Pairwise similarity calculated for the top 100 compounds showed that structures generated by ChemTS had the highest diversity, while REINVENT, explicitly biased by scaffold diversity, was comparable to or worse than CReM-dock (Fig. 7). The observed similarity of 1 in Fig. 7 is due to the stereoisomers. Examples of top scored structures are given in Table S5.
![]() | ||
| Fig. 7 Pairwise Tanimoto similarity computed using 2048-bit Morgan fingerprints with radius 3 for top 100 compounds selected by docking score. | ||
To achieve a reasonable trade-off between throughput and accuracy the MD sampling time allocated to each window of the ABFE protocol was 1 ns. ABFE calculations executed with such short sampling time are known to tend to systematically overestimate experimental binding free energies.58,70,84 To account for the bias introduced with this ABFE protocol, two reference sets of compounds were included to facilitate discrimination between active and inactive molecules. As inactive compounds, we selected 20 ChEMBL molecules with pIC50 < 5. These compounds were docked using Vina, and the top-ranked poses were used for ΔG calculations. None of these compounds formed interactions with the hinge region. As active compounds, we selected 12 ligands from PDB complexes (1FVV, 1OIR, 2B52, 2B55, 2UZL, 2VV9, 2XMY, 3EZV, 4BCP, 4BCQ, 5K4J, and 5NEV) with pIC50 ≥ 6. For these ligands, the native poses observed in the corresponding X-ray structures were used. In all cases except 2UZL, the ligands formed contacts with the hinge region in their crystallographic binding modes.
The calculated binding free energies clearly discriminated the active PDB ligands from the inactive ChEMBL compounds (Fig. 8). Compounds generated by CReM-dock without guidance and without selection for hinge-region interactions were not statistically distinguishable from the inactive reference set, whereas structures generated with PLIP guidance and selected for hinge-region interactions differed significantly from the true inactives. Csp3-enriched structures designed under PLIP guidance showed ΔG values comparable to those of structures generated without explicit Csp3 enrichment (Fig. 8). These findings suggest that, for CReM-dock, guidance by a known interaction pattern may be important for obtaining favorable binding free energies, whereas enrichment in Csp3 atoms does not substantially affect this metric.
Structures generated by ChemTS were likewise not statistically distinguishable from the inactive set, although one compound exhibited a particularly favorable calculated free energy and may therefore represent a promising candidate. Structures generated by REINVENT differed significantly from the ChEMBL inactive compounds and displayed the narrowest distribution of calculated ΔG values among all methods. Notably, REINVENT-generated structures generally did not interact with the hinge region, with only five compounds forming a single hydrogen bond out of the three possible hinge-region interactions. Thus, unlike CReM-dock, REINVENT did not appear to require explicit PLIP guidance to achieve favorable ΔG values. Overall, the results obtained for REINVENT and for PLIP-guided CReM-dock were statistically indistinguishable. Nevertheless, CReM-dock produced several compounds with particularly promising calculated binding free energies. Representative top-ranked structures are shown in Fig. 9.
![]() | ||
| Fig. 9 Structures designed by CReM-dock, REINVENT and ChemTS having the lowest calculated binding free energy to CDK2. | ||
In conducting the simulations, we modified certain parameters to avoid excessive searching and to conserve computational resources. Given that molecular growth results in an increase in molecular weight, it was deemed impractical to continue the generation process after achieving the desired molecular mass of the larger ligand. The threshold values for physicochemical parameters were established at the corresponding values of the larger ligand plus 15% (for molecular weight, log
P, and topological polar surface area), while the rotatable bond count (RTB) was set to the corresponding value of the larger ligand plus 1. To further constrain the generation process, we established an RMSD threshold of 1 Å and a minimum PLIP similarity of 0.6 for the 2ZWZ and 3S1G pairs, and 0.5 for the 2HB1 pair (Table S6). These parameters were intended to direct the generation of compounds that would maintain the binding pose of the initial fragment and its interactions with the corresponding protein. For the generation process, we utilized the ChEMBL SA2 fragment library with a radius of 2. Our selection strategy involved clustering into 100 groups and selecting the top-scoring structure from each cluster, thereby enhancing the comprehensiveness of the search and expanding the chemical space explored. This approach did not significantly increase computational costs, as we commenced with relatively large fragments and anticipated that the target molecules would be generated with a limited number of steps. Overall, between 8200 and 24
500 molecules were generated for each of the three targets.
From the pool of generated structures, we identified those most similar to the target compounds using Tanimoto similarity based on 2048-bit Morgan fingerprints with a radius of 2. In all instances, the most similar generated ligands exhibited the same binding mode and preserved key protein–ligand interactions (Table 4, Fig. 10). For the 2HB1–2QBS pair, the pipeline successfully identified a structure identical to the larger ligand, 2QBS. Additionally, another structure with a similarity score of 1 was found, which contained a cycloheptyl residue in place of cyclohexyl; both compounds demonstrated highly similar binding modes relative to the target ligand. In the case of the 3S1G–3GC4 pair, the most similar generated ligand lacked a positively charged secondary amine center and, consequently, could not establish contact with Asp280 as the target ligand did. However, the generated compound included an amino group that enhanced binding to Asp156 and featured a hydrophobic moiety that filled the pocket in a manner analogous to the target compound. For the 2ZWZ–2ZX9 pair, the most similar generated ligand effectively occupied the hydrophobic pocket in a manner closely resembling the target compound, although it contained an ethyl group instead of a cyclopentyl residue. The most similar structure was represented by two enantiomers; both could adapt to the binding site and maintain the necessary contacts.
![]() | ||
| Fig. 10 Binding poses of smaller (green) and larger (magenta) ligands as well as generated compounds (yellow/blue) the most similar to the target (larger) ligand. | ||
In all cases, the generated structures were able to approach the chemical space represented by the target molecules and fit within the binding site while preserving the pose and interactions (Fig. 10). This demonstrates the capability of the CReM-dock approach to explore not only novel chemical spaces but also relevant spaces that have been previously validated through experimental means.
A critical consideration, however, is the potential generation of numerous synthetically unfeasible structures, as it is conceivable that an approach may predominantly explore regions of chemical space that are not synthetically accessible. Nonetheless, this does not inherently imply that an approach will fail to identify promising hits; rather, it suggests that such an approach may be computationally inefficient, necessitating additional time for output analysis. Previous research demonstrated that AiZynthFinder can generate retrosynthetic pathways for structures with SA scores below 2.5–3 in 50% of cases or more.35 Consequently, we hypothesize that structures generated with SA scores below 3 or 2.5 are likely to be synthetically accessible. The CReM-dock tool primarily produces structures with low SA scores, as the complexity of the structures is incrementally increased across iterations (Fig. S7).
We conducted an analysis of the Retrosynthetic Accessibility (RA) scores86 for the top 100 generated structures and observed that the RA scores were predominantly high across nearly all protein targets, with a median exceeding 0.95. This suggests that at least half of these structures may have feasible retrosynthetic pathways (Fig. 11). It is important to acknowledge that the RA score has inherent limitations; for instance, previously synthesized compounds, such as inhibitors of BACE1, did not achieve high RA scores. To enhance the likelihood that the majority of the top-scoring generated structures attain high RA scores, we recommend utilizing a radius of 4 or greater, along with a fragment database derived from more synthetically accessible compounds (Fig. S8). However, this approach may result in a reduction of the docking scores for those structures (Fig. 2). Consequently, we do not anticipate significant challenges for CReM-dock concerning the synthetic accessibility of the designed structures, as it allows for flexible control over this aspect. Nonetheless, experimental validation remains crucial to confirm the practical applicability of the proposed pipeline.
Another aspect to consider in addressing the primary question is the lack of evidence that a single generative tool can efficiently explore the entirety of the synthetically accessible chemical space and outperform other tools. It is more plausible that each tool covers distinct regions of chemical space, which may overlap but are not entirely encompassed by other tools. Thus, employing multiple tools in conjunction may yield complementary insights and facilitate the selection of the most promising structures for further investigation.
While there is currently no experimental validation of the specific pipeline described herein, nor evidence that the designed structures are synthetically accessible, the developed tool has been integrated into a more comprehensive design and virtual screening pipeline, demonstrating its applicability. In the CACHE challenge #1, where our group achieved third place,48 participants were tasked with identifying binders for the WDR40 domain of the LRRK2 kinase, implicated in the pathogenesis of Parkinson's disease. CReM-dock was utilized as one of the tools to generate promising hits, which were subsequently employed as queries for similarity searches in ultra-large chemical databases. The suggested hits were confirmed through experimental validation. This underscores the utility of the developed tool.
Practical recommendations:
• It is advisable to utilize larger context radii and/or fragments derived from more synthetically accessible molecules to facilitate the generation of structures that are themselves synthetically accessible.
• In scenarios involving larger radii and smaller fragment databases, repeated runs are unnecessary, as the results exhibit high reproducibility.
• To achieve greater diversity of solutions, Pareto selection is recommended; conversely, for more stable runtime, selection based on clustering should be employed.
• Whenever feasible, incorporating multiple protein conformations can enhance the diversity of the generated structures.
• It is beneficial to impose restrictions on protein–ligand interactions, if possible.
• The augmentation of an objective function has been shown to be effective in enhancing drug-likeness.
• When generating structures enriched with sp3 carbon atoms, it is advisable to utilize fragments that are themselves enriched in sp3 carbon atoms, as the augmentation of the objective function tends to perform poorly in this context.
• The number of iterations may be constrained to five or six, as subsequent iterations often lead to increased molecular complexity while resulting in diminished docking scores (Fig. S7).
Upon analyzing the outputs of CReM-dock, it was observed that many of the top-scoring structures contained numerous methyl groups or halogen atoms. These substituents are typically introduced during the later iterations and are unlikely to be critically important for ligand-protein recognition and binding. However, their incorporation can increase the complexity of the structures and reduce their synthetic feasibility. To mitigate this issue, it is recommended to specify a minimum size for the attached fragment that exceeds the default value of one.
An alternative approach involves the application of a fragment expansion strategy. Specifically, an anchor fragment that demonstrates binding affinity to a protein can be extracted from a co-crystallized ligand. This fragment can then be expanded using CReM-dock while maintaining its binding pose and interactions. This method presents potential advantages over de novo design, as the binding pose of at least one fragment is already established, thereby increasing the probability of generating active molecules.
An important advantage of the approach is its ability to preserve predefined protein–ligand interactions during generation, particularly when such interactions are known to be essential. This substantially increases the number of structures exhibiting the desired interaction patterns. An additional level of control can be introduced by imposing an RMSD threshold relative to the pose of the initial fragment, thereby helping to preserve the binding conformation of the generated molecules. In three retrospective case studies, this strategy enabled the generation of compounds that were either identical or highly similar to known target molecules previously developed by medicinal chemists during fragment optimization.
The developed tool was found to be competitive with the state-of-the-art REINVENT4 and ChemTS approaches. Molecules generated by CReM-dock frequently displayed greater novelty while maintaining comparable docking scores and synthetic accessibility. Furthermore, in a separate study, structures generated by CReM-dock under guidance from protein–ligand interaction patterns exhibited favourable calculated binding free energies for CDK2 and achieved ΔG values comparable to those of known inhibitors, whereas structures generated without consideration of PLIP-derived interactions showed poorer calculated binding energies. Overall, the method provides substantial flexibility and can be adapted to a range of tasks, including de novo molecular generation, fragment expansion, and scaffold decoration.
| Csp3 | The fraction of sp3 carbon atoms in a molecule |
| Csp3BM | The fraction of sp3 carbon atoms in a Bemis–Murcko scaffold of molecule |
log P | Lipophilicity |
| MW | Molecular weight |
| PLIP | Protein–ligand interaction pattern (a set of protein–ligand contacts which should be at least partially observed in generated structures docked to a corresponding protein) |
| QED | Quantitative estimate of drug-likeness |
| RTB | The number of rotatable bonds |
| SA | Synthetic accessibility score |
| TPSA | Topological surface area |
Supplementary information (SI) is available. See DOI: https://doi.org/10.1039/d6dd00131a.
| This journal is © The Royal Society of Chemistry 2026 |