Open Access Article
Mohammed Azzouzi
*a,
Thanapat Worakul
a and
Clémence Corminboeuf
ab
aLaboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland. E-mail: mohammed.azzouzi@epfl.ch
bNational Center for Competence in Research-Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
First published on 30th March 2026
The rapid progress in generative models for molecular design has led to extensive libraries of candidate molecules for biological and chemical applications. However, ensuring these molecules are diverse and representative of broader chemical space remains challenging, with researchers often over-exploring limited regions or missing promising candidates due to inadequate monitoring tools. This work presents NaviDiv (Navigating Diversity in Chemical Space), a comprehensive web-based framework for managing chemical diversity in the string-based generative molecular design through three integrated capabilities: multi-metric diversity analysis capturing structural, syntactic, and molecular framework variations; interactive real-time visualization enabling immediate detection of model collapse; and adaptive constraint generation that dynamically guides optimization while preserving diversity. Through a singlet fission material discovery case study using REINVENT4, we demonstrate that different diversity metrics (i.e. structural similarity, fragment composition, and sequence patterns) respond differently during optimization, with constraint effectiveness depending critically on representational alignment with the generative model. n-Gram-based constraints outperform fingerprint-based approaches due to direct correspondence with SMILES generation, while combined constraints maintain diversity across all metrics while achieving optimization performance within 15% of unconstrained baselines. The framework is freely available at https://github.com/LCMD-epfl/NaviDiv, providing accessible tools for data-driven decisions about diversity–property trade-offs in automated molecular discovery.
Generative molecular models are often employed for proposing molecules with targeted properties. To achieve this, a guidance strategy must be implemented to steer the generation, effectively shifting the distribution of generated molecules toward higher-performing candidates. The guidance approach differs from one model architecture to another; for example, gradient-based methods have been used on the latent space to optimise the property of interest in variational autoencoders (VAEs) and diffusion models.19–21 In the case of autoregressive models22–25 such as REINVENT,26,27 this is accomplished by updating model weights and biases through a policy-driven approach28,29 that optimizes the generation of molecules with properties aligned with specific criteria. This optimization is carried out via reinforcement learning, where actions involve adjusting model parameters, and the objective typically focuses on increasing the mean performance of the generated molecules. However, as the model is optimized, it inevitably focuses the generation on specific types of molecules in the confined region of the chemical space, leading to a reduction in molecular diversity. To mitigate excessive deviation from the initially trained model, a regularization term can be introduced within the reinforcement learning framework.29 Nevertheless, this regularization alone is often insufficient to prevent model collapse (loss of chemical diversity over successive training generations).30
Existing approaches to diversity preservation operate at different levels of the generative pipeline. Policy-based methods modify reinforcement learning strategies or employ multi-agent frameworks to enhance exploration.26,27,31 Others introduce diversity penalties at the evaluation function level, discouraging structurally similar molecules or recurring fragments.32,33 However, these approaches typically employ fixed parameters and single diversity metrics, lacking systematic monitoring of how constraints impact chemical space exploration during optimization. Post hoc diversity assessment methods31,34–36 enable comparison across generative models but cannot guide real-time intervention. This creates a critical gap: researchers can measure final diversity outcomes but cannot observe or respond to diversity loss as it occurs.
While several tools address aspects of this challenge, each has fundamental limitations. REINVENT and similar platforms26,27 include built-in diversity filters that penalize structural similarity or recurring scaffolds, but these typically rely on single diversity metrics (e.g., Tanimoto similarity thresholds) with fixed parameters throughout optimization, making them insensitive to the multi-faceted nature of diversity collapse and unable to adapt as chemical space exploration evolves. Post hoc benchmarking frameworks such as GuacaMol32 and MOSES37 provide comprehensive diversity assessment across multiple metrics, enabling systematic comparison of generative models, but operate exclusively after generation is complete and therefore cannot inform real-time intervention or adaptive constraint adjustment. Recent advances in guided generation, including reinforcement learning strategies with shaped rewards38 and multi-objective optimization approaches, have improved exploration–exploitation balance, yet these methods lack systematic monitoring capabilities to detect when and how diversity loss occurs during optimization. What remains absent is an integrated framework that combines multi-metric diversity analysis with real-time visualization and adaptive constraint generation, enabling researchers to observe diversity collapse patterns as they emerge and dynamically adjust guidance strategies throughout the molecular discovery process.
The practical consequences of inadequate diversity management became evident in our previous work on singlet fission material discovery with reinforcement-learning-driven generative design workflow.39 Despite successful initial optimization using REINVENT4,27 the model converged to structurally similar molecules with recurring fragments after 140–150 iterations. Addressing this required 10 manual intervention cycles: analyzing generated molecules, identifying overrepresented fragments, implementing penalties, and restarting optimization. This iterative process, while ultimately successful, highlighted the need for systematic tools that enable real-time diversity monitoring and adaptive constraint implementation throughout the generative process.
This work presents NaviDiv (Navigating Diversity in Chemical Space), a comprehensive framework designed around three core capabilities for chemical diversity management in generative molecular design: comprehensive diversity analysis, interactive visualisation, and adaptive guidance. The framework is built as an interactive web application with a Python backend, designed to monitor and steer the diversity of molecules generated by string-based recurrent neural network (RNN) models. While our implementation is currently tailored for REINVENT4, the approach is generalisable to any string-based model optimized via reinforcement learning for the property-directed generation. We demonstrate how these three capabilities work synergistically to enable informed decision-making about diversity–property trade-offs in automated molecular discovery campaigns.
NaviDiv extends this workflow by introducing systematic diversity management capabilities that operate alongside property optimization. Importantly, NaviDiv is designed to be accessible to both computational and experimental chemists, with an intuitive web-based interface that requires no specialized programming knowledge. The framework is built around three core capabilities:
Comprehensive diversity analysis provides multiple complementary metrics to assess chemical diversity from different perspectives, capturing structural, syntactic, and architectural aspects of molecular variation.
Interactive visualisation and monitoring enables real-time observation of diversity evolution through chemical space projections, temporal analysis, and fragment frequency tracking.
Adaptive guidance and constraint generation actively steers the generative process through dynamic penalty functions that maintain desired diversity levels while optimising for target properties.
The framework imposes minimal computational overhead, with performance metrics for each algorithm detailed in the SI. Overall, NaviDiv adds less than 5 seconds per iteration to the optimization workflow, enabling practical integration into routine molecular discovery campaigns without significant performance degradation.
Chemical diversity in this work is defined in broad and context-dependent terms, acknowledging that its meaning varies significantly across different fields.40,41 In organic electronics, chemical diversity refers to a wide array of π-conjugated building blocks, which differ in the arrangement and nature of donor-rich and acceptor-rich moieties. Beyond molecular composition, diversity also encompasses molecular symmetry and the potential for ordered spatial arrangement in the solid state, both of which critically influence charge transport, crystallinity, and device performance.42 In catalysis, the concept is more constrained: the core catalytic unit often remains unchanged, while diversity is introduced through systematic modifications of the surrounding ligands to optimize activity, selectivity, and stability.43,44 In drug discovery, chemical diversity encompasses both structural and functional variation among small molecules, including differences in scaffolds, stereochemistry, and physicochemical properties. This diversity is essential for exploring chemical space and increasing the likelihood of identifying bioactive compounds with novel mechanisms of action.45,46
This application-specific nature of chemical diversity motivates our multi-metric approach, where different representations capture complementary aspects of molecular variation. Multiple approaches exist for assessing the chemical diversity of a compound set, each focusing on different molecular features.47,48 These methods can be broadly categorized based on the type of representation or structural abstraction they employ. One common strategy is the representation distance-based approach, which uses specific molecular representations—such as structural fingerprints—combined with distance metrics to quantify similarity or dissimilarity between compounds based on their overall structure.49 For string-based generative models, we distinguish string-based representations such as simplified molecular input line entry system (SMILES),50 where diversity can be evaluated through semantic or syntactic analysis of the molecular encodings, capturing differences in sequence patterns rather than just structural features. Another approach is the scaffold-based method, where molecules are reduced to their core frameworks by algorithmically removing additional functional groups and side chains. These scaffolds are then compared to evaluate diversity at the level of molecular backbones.51 A fourth method is the fragment-based approach, in which molecules are systematically broken down into smaller substructures. The diversity is then assessed based on the presence and frequency of these fragments across the dataset.52
Building on these established approaches, our tool implements a comprehensive set of metrics specifically designed to assess chemical diversity across multiple representational spaces. These metrics reflect the different approaches discussed above and provide complementary perspectives on molecular variation. The specific methods and their implementation details are described in the SI. The tool also allows users to establish custom diversity metrics based on their specific needs. The implemented metrics include:
The multi-metric approach ensures comprehensive assessment, as different representations exhibit varying sensitivity to optimization pressure and model collapse phenomena, as demonstrated by the differential degradation patterns observed in the singlet fission campaign.
All visualisations are interactive, allowing users to explore specific molecular clusters, investigate outliers, and understand the relationship between diversity patterns and property optimisation objectives. The web-based interface ensures accessibility for both computational and experimental chemists.
The guidance system operates in closed-loop with the generative model, continuously updating constraint functions based on real-time analysis results. This ensures responsive adaptation to changing diversity patterns throughout the molecular discovery process, addressing the limitation of the manual iterative approach used in the original singlet fission study, where fragment identification and penalty implementation required 10 separate intervention cycles.
Future developments will focus on extending the framework to additional generative architectures to broaden its applicability beyond string-based RNNs. Incorporating 3D structure-aware diversity metrics that account for conformational flexibility and stereochemistry would further enhance relevance to materials and molecular design applications. Property-aware diversity assessment could integrate functional similarity alongside structural diversity, while multi-objective constraint optimization based on Pareto frontier analysis would enable systematic exploration of diversity–property trade-offs.
This case study showcases all three NaviDiv capabilities: (1) comprehensive diversity analysis across multiple metrics, (2) real-time monitoring of diversity evolution during optimization, and (3) adaptive constraint implementation to preserve diversity while maintaining optimization performance.
Following the same generative design workflow as in our previous work,39 we employed REINVENT3.2 to train the generative model on an extended dataset combining FORMED62 and GEOM3D,63 optimized for organic electronic molecules. Subsequently, REINVENT4 was used for goal-directed generation via reinforcement learning, using an evaluation function previously developed to explore the chemical space of molecules with singlet-fission character.39,62,64 We specifically use the same evaluation function from our previous work, where we assess the difference in energy between the lowest first singly excited state and the energy of the triplet excited state.39 The evaluation function employs ChemProp models65 trained on excited state energies from the FORMED dataset to predict these electronic properties. More details about the generative model and the evaluation function are provided in the SI Sections S3 and S5.
Here, we will first show the use of the diversity analyser on a run of the generative model with 1000 iterations of the reinforcement learning step with 100 molecules generated per generation step. Then, we introduce different adaptive constraint functions and assess their impact on the evolution of the different metrics of chemical diversity of the generated molecules.
Fig. 3b–d track the evolution of molecular diversity across three distinct representations. The detailed implementation of the different metrics is presented in the SI Section S1.
• Fig. 3b: structural diversity is evaluated using Morgan fingerprints with Tanimoto similarity. We examine (i) the mean pairwise similarity among 100 molecules generated per RL step, and (ii) the number of unique molecular clusters, where a cluster is defined as a set of molecules with pairwise similarity greater than 0.3. For reference, a histogram of pairwise similarities in the FORMED database (see Fig. S3) shows that values above 0.2 are rare; hence, a similarity threshold of 0.3 effectively defines structural clusters. Initially, the average similarity is low but it increases substantially during training. The number of unique clusters decreases from nearly 100% (every molecule is in a unique cluster) at the start to approximately 10% after 200 RL steps.
• Fig. 3c: diversity in SMILES string space is analyzed via 10-character substrings (10-grams). We report (i) the proportion of unique 10-grams, and (ii) the number of 10-grams occurring in more than 10% of generated SMILES. Initially, nearly all 10-grams are unique, but this fraction decreases to ∼50% by step 200 and ∼40% by step 1000. Concurrently, the number of frequently occurring 10-grams increases from zero to over 100, indicating reduced sequence-level diversity and convergence toward similar SMILES patterns.
• Fig. 3d: fragment-level diversity is assessed based on chemically meaningful substructures derived from a fragmentation algorithm (see SI, Section 1.2). The proportion of distinct fragments drops modestly from 100% to approximately 80% over training. Only a small number (fewer than 10) appear in more than 10% of the generated molecules, suggesting that chemical fragment diversity remains largely preserved, despite convergence in other molecular representations.
Overall, Fig. 3 illustrates the characteristic trade-off between molecular optimization and diversity during reinforcement learning. While the average molecular score increases consistently, this improvement comes at the cost of reduced diversity among the generated molecules. Importantly, the extent of diversity loss varies depending on the representation used: structural similarity based on Morgan fingerprints and sequence-level redundancy measured via 10-grams are more substantially impacted than fragment-based metrics. These findings highlight the necessity of employing multiple complementary diversity metrics to fully capture the molecular evolution induced by reinforcement learning. Moreover, this multifaceted perspective becomes essential when designing strategies to constrain model behaviour and preserve chemical diversity. For example, since fragment-based diversity shows only modest degradation (from 100% to 80%) compared to the substantial losses observed in structural similarity and sequence-level metrics, maintaining complete fragment diversity may require stricter constraint thresholds than those needed to preserve sequence-level or fingerprint-based variation, which degrade more rapidly and thus trigger intervention at higher threshold levels.
We implement three core constraint types, each targeting different aspects of molecular diversity. Details about the implementation of the diversity constraints is presented in the SI Section S2. Constraint thresholds were established based on the baseline diversity evolution analysis (Fig. 3) and designed to intervene before significant diversity loss occurs. We note that the specific threshold values presented here were selected to demonstrate the framework's capabilities across a range of constraint strictness levels, rather than to represent universally optimal parameters. The optimal thresholds are inherently problem-dependent, varying with the chemical space, generative model, and optimization objective. NaviDiv's real-time monitoring dashboard enables users to observe diversity dynamics under different settings and adapt thresholds to their specific application. Specifically:
To these three individual constraint types, we add two combined regimes that integrate all three constraints with varying strictness and the baseline case without any constraints:
The results of the different constraint regimes are shown in Fig. 4. Fig. 4a shows the evolution of the average molecular score, while Fig. 4b shows the evolution of the log-likelihood under the prior model. Regardless of the constraint regime, the average molecular score increases steadily, indicating successful optimization toward the design objective. However, depending on the constraint regime, the score increases at different rates, with the baseline (no constraints) showing the fastest increase. The case with similarity-based constraints shows a similar increase in the average score, while the n-gram and combined constraints show a slower increase that does not reach the same level as the baseline even after 1000 steps. The prior negative log-likelihood under the prior model increases for all constraint regimes, indicating that the model increasingly explores regions of chemical space that deviate from the prior distribution. The rate of increase is considerably similar across all constraint regimes for the first 500 steps, but the combined high constraints (all strong criteria) shows a steeper increase in the log-likelihood after 500 steps, indicating that the model is generating molecules that are increasingly different from the prior distribution. This monotonically increasing divergence from the prior across all constraint regimes demonstrates that the model genuinely explores novel regions of chemical space rather than merely exploiting biases of the reward model.
The other panels in Fig. 4 show the evolution of the different diversity metrics with reinforcement learning steps. Compared to the baseline, the fragment-based constraints case does not show any fragment that occurs in more than 10% of the generated molecules, and no change in terms of the unicity of the list of fragments generated with the evolution of the reinforcement learning steps (Fig. 4c and d). Meaning that the fragment-based constraint successfully helps the model generate molecules with very similar fragments. Similarly, the introduction of the n-gram constraint function reduces the number of 10-grams that occur in more than 10% of the molecules and maintains the ratio of distinct 10-grams to the same level at the beginning (Fig. 4g and h). On the other hand, the similarity-based constraints case does not show any impact on the evolution of the mean similarity with reinforcement learning steps, as well as a similar reduction in the number of clusters of molecules generated as the baseline (Fig. 4e and f). Even though the approach identifies a large number of molecules to avoid, it does not impact the evolution of the similarity metrics, indicating that the constraint is not effective in this case.
The three diversity-aware constraint regimes (i.e., fragment-based, n-gram-based, and similarity-based) show three different impacts on the evolution of the chemical diversity metrics. (1) The introduction of the similarity-based constraints does not preserve the structural diversity of the molecules generated, showing little impact on the evolution of (2) the fragment-based constraints effectively preserve the diversity of the molecular fragments in the generated molecules but do not affect the other diversity metrics, i.e., the mean similarity of the molecules generated or the diversity of the 10-grams. (3) The n-gram-based constraints not only preserve the sequence-level diversity but also maintain the diversity across the other metrics, i.e., the mean similarity and the fragment-based diversity. These three cases show that the introduction of diversity-aware constraints can have different impacts on the evolution of the chemical diversity metrics and that having a tool to monitor the chemical diversity of the generated molecules is essential to understand how the constraints impact the chemical space exploration. The n-gram based constraints in our case shows the best performance in terms of preserving chemical diversity across all metrics, which is expected as the n-gram based constraints are directly applied to the SMILES strings, which are the same representation used by the generative model.
Combining the different constraints seem to overall improve the preservation of the chemical diversity, at the expense of slowing down the rate of increase of the average molecules score. The combined high constraints shows the best performance in terms of preserving chemical diversity across all metrics, with an expected reduction in the rate of increase of the average molecular score. The combined low constraints case shows a slightly worse preservation of the chemical diversity metric as the training progresses, as compared to the combined high constraints, but with a faster rate of increase in the average molecular score. This indicates that the choice of thresholds for the diversity constraints is crucial in balancing the trade-off between optimization performance and chemical diversity preservation. The choice of the thresholds can be adapted to the specific application and the desired level of diversity preservation. Access to a tool that allows monitoring the chemical diversity of the generated molecules is essential to understand how the constraints impact the chemical space exploration and to adapt the thresholds accordingly.
Conversely, similarity-based constraints operate on post-generation molecular fingerprints, creating a representational mismatch with the SMILES-based generation process. While these constraints correctly identify structurally redundant molecules, they cannot effectively guide the character-level sequence generation that determines molecular output. This mismatch explains the limited effectiveness despite accurate structural redundancy detection.
Fragment-based constraints occupy an intermediate position, operating on chemically meaningful substructures that partially align with both SMILES syntax and chemical interpretation. This partial alignment enables targeted effectiveness in preserving fragment diversity while maintaining limited influence on other metrics.
Our multi-metric assessment reveals that different diversity measures exhibit varying sensitivity to reinforcement learning optimization, with structural similarity and string-based metrics showing greater vulnerability than fragment-based measures. This differential sensitivity necessitates comprehensive monitoring using complementary representations rather than relying on single metrics.
The representational alignment principle, where constraint effectiveness depends on operating within the same space as the generative model, represents a fundamental design consideration for diversity-aware generative systems. n-Gram-based constraints demonstrated superior performance due to direct correspondence with SMILES-based generation, while similarity-based approaches showed limited effectiveness despite correctly identifying structural redundancy. This finding suggests that researchers should prioritize constraint mechanisms compatible with their chosen generative model architectures.
The synergistic benefits of combined constraint strategies, achieving improved diversity preservation compared to individual approaches while maintaining optimization performance within 15% of unconstrained baselines, support multi-objective approaches to generative chemistry. Real-time monitoring capabilities enable dynamic parameter adjustment, addressing limitations of static constraint approaches that become suboptimal as optimization progresses.
The framework's general-purpose design extends beyond the singlet fission case study to any molecular inverse design applications. The integration of analysis, visualization, and constraint generation into an accessible web interface democratizes advanced diversity management tools for both computational and experimental researchers.
By bridging computational power with chemical intuition through systematic diversity management, this work contributes to more effective and interpretable molecular discovery workflows, ultimately supporting accelerated identification of novel compounds with desired properties across diverse application domains from drug discovery to materials science.
Supplementary information (SI): implementation details of all diversity metrics (representation distance-based, fragment-based, scaffold-based, and string-based); pseudocode for the diversity constraint algorithms (similarity-based, fragment-based, and n-gram-based); singlet fission evaluation function and machine learning model details; FORMED database statistics including pairwise similarity distributions; detailed experimental setup and REINVENT4 configuration parameters; statistical analysis of constraint regimes with variance across independent runs; a generalizability study applying NaviDiv to QED optimization; and full software implementation details including package architecture. See DOI: https://doi.org/10.1039/d5dd00487j.
| This journal is © The Royal Society of Chemistry 2026 |