Open Access Article
Yuki Deguchi
* and
Masato Taki
Graduate School of Artificial Intelligence and Science, Rikkyo University, Tokyo, Japan. E-mail: hijoo.guchi@gmail.com; taki_m@rikkyo.ac.jp
First published on 30th May 2026
Technologies for designing molecules with desired properties have the potential to drive innovation across a wide range of fields. Molecular inverse design typically involves three key tasks: chemical latent space representation, property prediction, and molecule generation. While deep learning models trained on large molecule-property datasets can address all three tasks within a unified framework, they often require substantial property-specific retraining when targeting new molecular properties, limiting scalability. In this work, we propose an algorithmic framework that integrates machine learning and quantum annealing by explicitly decoupling chemical latent space representation, property prediction, and molecule generation. By freezing deep molecular representations learned from large datasets and separating molecule generation from deep learning, the proposed method enables inverse design for new target properties at a low training cost, requiring only lightweight model adaptation. Using quantum annealing, 98% of the generated molecules were novel and exhibited properties close to the desired targets, indicating efficient exploration beyond the training distribution. Moreover, the molecular generation rate was approximately six times faster than that of classical optimization algorithms. These results demonstrate that modularizing molecular inverse design into complementary learning and optimization components provides a scalable and effective alternative to monolithic deep learning-based approaches.
Molecular inverse design based on desired properties is commonly referred to as inverse Quantitative Structure–Activity Relationship (inverse QSAR). Inverse QSAR typically involves three essential tasks: chemical latent space representation, property prediction, and molecule generation. In recent years, deep learning-based generative models have attracted considerable attention, as they can address these tasks within a unified framework while achieving high performance.7 Representative approaches include recurrent neural network (RNN)-based,8–11 variational autoencoder (VAE)-based,12,13 generative adversarial network (GAN)-based,14,15 diffusion-based,16,17 and Transformer-based models.4,18,19
In all cases, candidate molecules are generated by sampling from learned chemical representations conditioned on target property objectives. Despite methodological differences among these approaches, extending inverse molecular design to new target properties remains challenging. End-to-end generative models typically require substantial retraining when the target property changes,9,20 while partially decoupled or semi-supervised approaches still depend on property-specific predictors and sufficient labeled data for each new objective.19,21 In practice, many molecular properties cannot be reliably estimated in silico, and experimental acquisition of property labels remains costly and time-consuming, making it difficult to obtain sufficient data for scalable adaptation.22 As a result, the applicability of existing deep learning-based inverse QSAR methods across diverse and previously unseen property objectives remains fundamentally limited.
Recently, molecular inverse design methods that integrate deep learning with additional optimization or simulation techniques have been actively explored. Representative examples include genetic algorithms for the design of organic emitters with inverted singlet–triplet gaps,23,24 quantum annealing for inverse design of molecules with targeted drug-likeness,21,25 and docking simulations for structure-based generation of molecules with high binding affinity to specific protein targets.26 These hybrid approaches relax the reliance on purely end-to-end deep learning models and provide greater flexibility in the optimization process. However, in most existing hybrid frameworks, the optimization objective remains tightly coupled to task-dependent evaluation models or simulation pipelines, such as property-specific surrogate models used to evaluate fitness in genetic algorithms24,27 or computationally expensive physics-based docking calculations required for each target protein.26 As a result, adapting these methods to new molecular properties often incurs substantial computational and data costs, such as retraining surrogate models or reconfiguring simulation pipelines for each new objective, limiting their scalability. These observations highlight the need for a molecular inverse design framework that can adapt to new target properties without reconfiguring the entire optimization pipeline.
In this study, we propose a property–agnostic molecular inverse design framework that integrates machine learning and quantum annealing (Fig. 1). To address computational limitations in existing approaches, the framework explicitly decouples chemical latent space representation, property prediction, and molecule generation. A pretrained deep neural network is used to provide a general molecular representation that is shared across tasks and kept fixed. When designing molecules for a new target property, only a lightweight task-specific prediction head is adapted, without reconfiguring the overall optimization pipeline or retraining deep generative models. Because the prediction head has a simple functional form and operates on fixed molecular descriptors, the inverse design problem can be formulated as a quadratic unconstrained binary optimization (QUBO) problem. This QUBO formulation is central to the proposed framework, as it transforms the molecular inverse design problem into a well-defined combinatorial optimization that can be solved by a variety of solvers. Among them, quantum annealing is particularly well suited, as its solution-searching dynamics, driven by quantum tunneling effects, can efficiently navigate the rugged optimization landscapes that arise when multiple property constraints must be satisfied simultaneously.28 Solving this QUBO via quantum annealing enables the efficient identification of molecular descriptors satisfying the desired property objectives, which are subsequently decoded into molecular structures. Unlike prior approaches that focus on high-accuracy inverse estimation for specific properties, the proposed framework enables scalable adaptation to new objectives while maintaining high levels of molecular novelty, synthesizability, and diversity, comparable to or exceeding those reported in previous studies.19,21
The major contributions of this study are summarized as follows:
(1) We propose a property–agnostic framework for molecular inverse design that integrates machine learning and quantum annealing through feature-based transfer learning. By fixing a learned molecular representation and adapting only a lightweight task-specific prediction head for each target property, the proposed framework enables efficient inverse design without retraining deep generative models.
(2) We formulate molecular inverse design as a QUBO problem defined on pretrained molecular descriptors, allowing direct optimization by quantum annealing. This formulation decouples molecule generation from property-specific deep generative model training while preserving alignment with target property objectives.
(3) We demonstrate that the proposed framework generates molecules that closely match desired property values while achieving high novelty, synthesizability, and diversity. In addition, quantum annealing enables a molecular generation rate approximately six times faster than that of classical optimization algorithms.
Table 1 reports the predictive performance of the proposed approach alongside two representative deep learning models, Structure–Property Multi-Modal foundation model (SPMM)19 and Ajagekar's model.21 The evaluation datasets differ across the three methods. The proposed method was evaluated on 100
000 molecules from ZINC-22,33 PubChem34 and ChEMBL35 using a scaffold-based split, which ensures that no molecular scaffolds are shared between the training and test sets. In contrast, Ajagekar's model was evaluated on a 1000-molecule test set drawn from a 12
000-molecule ZINC 12 (ref. 36) subset using a random split,21,37 and SPMM was evaluated on 1000 unseen molecules from ZINC 15.19,38 Although the specific test molecules used in the prior studies are not publicly available, all three methods draw from ZINC-derived databases and predict the same RDKit-computed molecular properties, and structural overlap among the datasets is expected. Notably, the proposed method employs a larger and more diverse evaluation set with a stricter splitting strategy than either baseline. Despite these differences in evaluation conditions, the proposed method achieves competitive or superior performance in terms of the coefficient of determination (R2), root mean squared error (RMSE) and mean absolute error (MAE), while using substantially fewer trainable parameters. These results suggest that feature-based transfer learning on fixed deep molecular representations provides an effective and scalable approach to molecular property prediction.
P: Wildman–Crippen partition coefficient,45 TPSA: Topological Polar Surface Area46)
| Target property | Our models | Ajagekar's | SPMM | ECFP | |||||
|---|---|---|---|---|---|---|---|---|---|
| MAE | RMSE | R2 | MAE | RMSE | R2 | MAE | RMSE | R2 | |
| QED | 0.040 | 0.053 | 0.828 | 0.10 | 0.047 | 0.896 | 0.055 | 0.070 | 0.694 |
| SAS | 0.167 | 0.214 | 0.882 | 0.66 | — | — | 0.180 | 0.235 | 0.857 |
Log P |
0.092 | 0.120 | 0.991 | 1.27 | 0.410 | 0.901 | 0.429 | 0.547 | 0.821 |
| Molecular weight | 4.682 | 6.344 | 0.967 | — | 12.666 | 0.953 | 13.575 | 18.098 | 0.941 |
| TPSA | 1.200 | 1.565 | 0.996 | — | 3.104 | 0.985 | 6.938 | 8.996 | 0.861 |
| Number of hydrogen bond acceptors | 0.093 | 0.118 | 0.995 | — | 0.557 | 0.871 | 0.480 | 0.618 | 0.856 |
| Number of hydrogen bond donors | 0.073 | 0.103 | 0.991 | — | 0.094 | 0.990 | 0.254 | 0.345 | 0.897 |
| Number of rotatable bonds | 0.324 | 0.418 | 0.966 | — | 0.758 | 0.868 | 0.535 | 0.700 | 0.904 |
| Number of aromatic rings | 0.075 | 0.114 | 0.982 | — | 0.336 | 0.848 | 0.172 | 0.226 | 0.931 |
To verify the effectiveness of the CDDD representation itself, a comparison against extended-connectivity fingerprints (ECFP; radius 2 and 2048 bits),39 one of the most widely used molecular representations in chemoinformatics, was conducted under otherwise identical conditions. The CDDD-based models outperformed the ECFP baseline across all properties, with a mean R2 of 0.955 ± 0.060 compared to 0.862 ± 0.074, confirming that the pretrained CDDD representation captures richer molecular information than standard fixed-length fingerprints.
P).45
| Experiment | Annealing method | Target conditions | Number of generations |
|---|---|---|---|
| (a) | QA | QED > 0.7 & SAS < 3 | 1000 |
| (a)′ | SA | QED > 0.7 & SAS < 3 | 1000 |
| (b) | QA | QED > 0.7 & log P < 1 |
1000 |
| (b)′ | SA | QED > 0.7 & log P < 1 |
1000 |
| (c) | QA | QED > 0.7 | 100 |
| (d) | QA | QED < 0.7 | 100 |
(a) Generation of molecules with high drug-likeness and high synthetic accessibility (QED > 0.7 and SAS < 3).
(b) Generation of molecules with high drug-likeness and reduced lipophilicity (QED > 0.7 and log
P < 1).
In both experiments, molecules were generated by solving the QUBO-based inverse optimization problem via QA under the specified property constraints. The generation was performed within the CDDD latent space learned from organic compounds with molecular weights in the range of 12–600. Such combinations of property targets are of practical importance in drug discovery,40,44 yet are typically sparsely represented in available chemical datasets, making them challenging targets for data-driven generation. Under each condition, 1000 molecules were generated using both QA and classical simulated annealing (SA),47 with SA serving as a widely used classical baseline optimization method for comparison.
Table 3 reports the annealing error relative to the target property values, the deviation between predicted and target properties, as well as the validity, uniqueness, and novelty of the generated molecules. The extremely low annealing errors (MAE = 0.000) indicate that the underlying optimization problems were solved accurately. This result is expected by design. Because the prediction heads are linear functions of the CDDD descriptors, the inverse optimization reduces to a quadratic problem that QA can solve to near-exact optimality. The annealing MAE therefore reflects the precision of the QUBO solver, not the accuracy of the generated molecules' actual properties. The latter is captured by the MAE (actual value) column, which quantifies the deviation between the RDKit-computed48 properties of the decoded molecules and the target values. The nonzero actual MAE arises primarily from the approximation introduced during CDDD decoding, where the optimized descriptor vector is mapped back to a discrete molecular structure.
| Annealing | Target | Target | MAE | MAE | Validity | Uniqueness | Noveltya | |
|---|---|---|---|---|---|---|---|---|
| Method | Property | Value | Annealing | Actual value | ||||
| a Novelty was only calculated for valid molecules. | ||||||||
| (a) | QA | QED | 0.85 | 0.000 | 0.107 | 54.1% | 99.1% | 98.0% |
| SAS | 2.5 | 0.000 | 0.883 | |||||
| (a)′ | SA | QED | 0.85 | 0.000 | 0.184 | 56.0% | 99.2% | 95.2% |
| SAS | 2.5 | 0.000 | 1.057 | |||||
| (b) | QA | QED | 0.85 | 0.000 | 0.112 | 51.9% | 98.6% | 97.8% |
Log P |
0.0 | 0.000 | 1.171 | |||||
| (b)′ | SA | QED | 0.85 | 0.000 | 0.213 | 35.9% | 97.0% | 93.0% |
Log P |
0.0 | 0.000 | 1.976 | |||||
Fig. 2 shows the QED-SAS and QED-log
P distributions of the generated molecules, estimated using kernel density estimation (KDE).49,50 Table 4 summarizes the proportions of molecules satisfying the target property criteria in both the training dataset and the generated samples. In both experiments, molecules generated by QA and SA exhibit property distributions that are substantially more concentrated within the target regions than those of the training dataset, demonstrating effective and selective inverse design.
![]() | ||
| Fig. 2 Distributions of the target properties of the training dataset and the generated molecules, and examples of generated molecular structures. | ||
P
| Data source | QED > 0.7 | QED > 0.8 | QED > 0.9 | SAS < 3.0 | SAS < 2.5 | SAS < 2.0 | QED > 0.7 & SAS < 3.0 |
|---|---|---|---|---|---|---|---|
| (a) Generated by QA | 70.2% | 34.4% | 3.3% | 32.0% | 11.8% | 2.4% | 24.4% |
| (a)′ Generated by SA | 52.1% | 24.5% | 2.0% | 25.2% | 9.5% | 1.8% | 15.4% |
| Training dataset | 46.3% | 14.1% | 0.8% | 20.2% | 4.1% | 0.1% | 7.9% |
| Data source | QED > 0.7 | QED > 0.8 | QED > 0.9 | Log P < 1.0 |
Log P < 0.0 |
Log P < −1.0 |
QED > 0.7 & log P < 1.0 |
|---|---|---|---|---|---|---|---|
| (b) Generated by QA | 66.3% | 25.0% | 0.2% | 35.8% | 7.3% | 0.8% | 18.5% |
| (b)′ Generated by SA | 40.4% | 14.5% | 0.8% | 26.2% | 8.6% | 2.2% | 5.0% |
| Training dataset | 46.3% | 14.1% | 0.8% | 16.9% | 5.1% | 1.2% | 4.7% |
Notably, while QA and SA exhibit comparable performance in terms of molecular validity and novelty, QA consistently achieves a substantially higher success rate in generating molecules that simultaneously satisfy multiple target property constraints. As shown in Table 4, the proportion of molecules meeting both QED and SAS or QED and log
P criteria is significantly higher for QA than for SA or the training dataset. This indicates that QA provides enhanced selectivity in multi-objective molecular inverse design, rather than merely increasing diversity or random exploration. Such behavior is particularly relevant in constrained optimization settings, where feasible solutions occupy narrow and sparsely distributed regions of the search space, and the ability of QA to navigate complex and rugged optimization landscapes enables more effective identification of solutions satisfying multiple competing objectives. Further analysis of the distribution of generated molecules in the chemical latent space, discussed in the following section, suggests that this enhanced selectivity is accompanied by controlled exploration beyond the training distribution.
In terms of computational efficiency, QA achieved a mean end-to-end computation time of 9.55 ± 0.63 s per molecule in experiment a), whereas SA required 54.65 ± 0.45 s, corresponding to an approximately 5.7-fold speedup. Notably, the actual computation time of the quantum annealer itself was only 82 ± 11 ms. The remaining overhead arises primarily from classical components of the hybrid workflow, including application programming interface (API) communication, solver orchestration, and neural network-based decoding of molecular representations. This indicates that the observed latency reflects implementation-level constraints rather than fundamental limitations of QA.
Overall, these results demonstrate that the proposed framework enables selective and efficient generation of molecules with specified property profiles. Importantly, within this architecture, adapting inverse molecular design to a new target property requires only the lightweight adaptation of a task-specific prediction head on a fixed molecular representation, without retraining deep generative models. As a result, the proposed method achieves high selectivity, computational efficiency, and scalability across a broad range of molecular design objectives.
In this study, validity is defined as the proportion of generated SMILES that can be successfully parsed into chemically valid molecular structures by RDKit.48 Uniqueness refers to the proportion of valid molecules with distinct canonical SMILES. Novelty is defined as the proportion of valid molecules whose canonical SMILES do not appear in the reference dataset of approximately 22 billion molecules. Table 3 reports the novelty and uniqueness of molecules generated under each experimental condition. 98.0% of the molecules generated by QA in the high QED and low SAS setting were novel, and the uniqueness reached 99.1%. These results indicate that the proposed framework does not rely on brute-force search within the training dataset, but instead samples molecular candidates from a well-generalized chemical latent space.
To examine whether the novelty arises from entirely new core structures or from modifications of known molecular frameworks, we decomposed each generated molecule into its core ring system and linkers (Murcko scaffolds51) and compared them against the reference dataset. Of the unique scaffolds identified, 64.0% were absent from the reference set, indicating that the framework generates molecules with entirely new core structures. The remaining 36.0% shared scaffolds with the reference set, yet these molecules were still novel at the full-structure level. Analysis of the modification patterns on these known scaffolds revealed that the modifications extend well beyond simple single-site substitutions: on average, 8.8 heavy atoms were added beyond the scaffold, with 93.8% of molecules containing more than three additional heavy atoms. Furthermore, 85.1% of these molecules introduced atom types not present in their scaffold, such as halogens and heteroatoms. These findings suggest that the novelty of generated molecules arises from two complementary mechanisms: the generation of entirely new core structures through interpolation and extrapolation in the continuous CDDD latent space, and non-trivial multi-site functional group diversification on known frameworks.
A trade-off between molecular novelty and validity was observed, with the validity of generated molecules being approximately 50%. This behaviour reflects a deliberate design choice. While many likelihood-based generative models commonly adopted in molecular design are optimized to generate syntactically valid molecules close to the training distribution, the proposed optimization-driven framework actively explores out-of-distribution regions of the learned chemical latent space. As a result, highly novel and diverse molecular structures can be identified, albeit at the cost of reduced validity. Importantly, this limitation is not fundamental, and validity is expected to improve through the incorporation of more expressive generative backbones, such as transformer-based architectures, or more robust molecular string representations such as Self-referencing Embedded Strings (SELFIES).52
Fig. 3 illustrates the QED-SAS and log
P-SAS distributions estimated using KDE for molecules generated in experiment (b) and by Ajagekar's model. Table 5 summarizes the mean SAS values for each method. In experiment (a), SAS was explicitly constrained to low values, resulting in a large fraction of molecules with high synthetic accessibility. In experiment (b), only QED and log
P were controlled, while SAS was not directly optimized. Nevertheless, the mean SAS value of the generated molecules (3.555 ± 0.787) closely approached that of the training dataset (3.522 ± 0.622). In contrast, when controlling log
P, Ajagekar's model produced molecules with a substantially higher mean SAS value of 7.023 ± 0.832, indicating increased synthetic difficulty in this setting. Notably, the proposed framework generates molecules with lower SAS (i.e., improved synthetic accessibility) than Ajagekar's model, even without explicitly optimizing SAS. This advantage arises from performing optimization within a fixed chemical latent space learned from chemically valid molecules, which naturally constrains the search to structurally plausible regions. As a result, the method satisfies target property objectives while avoiding excessive structural complexity, leading to improved synthetic accessibility relative to previous studies.
![]() | ||
| Fig. 3 Distributions of SAS and the target properties of the molecules generated by our method and Ajagekar's model.21 The red line represents the mean SAS. | ||
| Data source | Target property | Mean SAS |
|---|---|---|
| (b) Generated by our method | QED & log P |
3.555 ± 0.787 |
| (1) Generated by Ajagekar's model | QED | 6.516 ± 0.931 |
| (2) Generated by Ajagekar's model | Log P |
7.023 ± 0.832 |
| Training dataset | — | 3.522 ± 0.622 |
Fig. 4 visualizes the distribution of molecular representations using Uniform Manifold Approximation and Projection (UMAP).53,54 Most molecules generated by QA and SA were sampled from regions overlapping with the training distribution, while a subset of molecules extended into previously unexplored regions of the chemical latent space. Notably, molecules with unconventional structural motifs, such as phosphite and tetrahydro-1,2,5-triazepine, were observed. This diversity may be influenced by thermodynamic noise55 and solution-searching behaviour associated with quantum tunnelling effects in QA.28 Together, these observations indicate that the proposed method enables broad and diverse exploration of chemical space beyond the training distribution.
Overall, these results demonstrate that the proposed framework generates molecules that are highly novel, synthetically accessible, and structurally diverse. Given that molecular design under specific objectives often remains constrained to known chemical scaffolds and their analogues,56 the ability to systematically explore out-of-distribution chemical space represents a key advantage of the proposed approach over conventional human-centered molecular design strategies.
Fig. 5(a) shows the QED-SAS distribution of molecules generated in experiment (a), while Fig. 5(c) shows the distribution obtained in experiment (c). In both experiments, the generated molecules exhibit higher QED values than those in the training dataset. Quantitatively, experiment a) achieved a mean QED of 0.753 ± 0.089 with a corresponding mean SAS of 3.307 ± 0.646. In contrast, when only QED was controlled in experiment (c), the generated molecules showed a lower mean QED of 0.709 ± 0.087 and a higher mean SAS of 3.558 ± 0.970. These results indicate that simultaneous control of QED and SAS in experiment (a) systematically shifts the generated molecules toward lower synthetic accessibility scores while maintaining higher drug-likeness, compared with single-objective optimization. This demonstrates that the proposed framework enables explicit and simultaneous control of multiple molecular properties, rather than implicitly favoring a single dominant objective.
![]() | ||
| Fig. 5 Distributions of QED and SAS of the training dataset and generated molecules: (a) high QED and low SAS targeted, (c) high QED targeted and (d) low QED targeted. | ||
Fig. 5(d) illustrates the QED-SAS distribution of molecules generated in experiment (d). While molecules generated in experiment (c) exhibit higher QED values than those in the training dataset, molecules generated in experiment (d) are clearly shifted toward lower QED values relative to experiment (c). These results demonstrate that the proposed framework enables selective and bidirectional control of molecular properties, rather than passively reproducing trends present in the training data.
A key characteristic of the proposed framework is the deliberate separation of deep representation learning from downstream property optimization. By fixing a pretrained molecular representation and adapting only a lightweight task-specific prediction head, the framework significantly reduces the computational and data requirements associated with introducing new molecular objectives. This design choice allows inverse molecular design to be formulated as a constrained optimization problem over as learned chemical latent space, which can be efficiently solved using quantum annealing. As demonstrated in the Results section, this approach enables the simultaneous control of multiple molecular properties while maintaining strong selectivity toward target property regions.
The observed trade-off between molecular novelty and validity warrants careful discussion. The validity of generated molecules is lower than that reported for likelihood-based generative models optimized for syntactic correctness and sampling efficiency, such as transformer-based or reinforcement learning–driven SMILES generators. However, this difference reflects a fundamental distinction in design objectives rather than a limitation of the proposed method. Likelihood-based generative models are typically trained to maximize the probability of generating molecules close to the training distribution, resulting in high validity but limited exploration beyond known chemical space. In contrast, the proposed optimization-driven framework explicitly targets constrained inverse problems and actively explores out-of-distribution regions of the learned chemical latent space. This design choice leads to substantially higher molecular novelty and diversity, albeit at the cost of reduced validity. However, this trade-off is not fundamental and can be mitigated through architectural and representational improvements.
Computational efficiency represents another important consideration. Although the end-to-end generation time per molecule is on the order of several seconds, the actual quantum annealing time required to solve the underlying QUBO problem is on the order of milliseconds. The dominant sources of latency arise from implementation-level factors, including hybrid solver orchestration, API communication overhead, and neural network-based decoding of molecular representations. These observations indicate that the current generation speed is not a fundamental limitation of the framework itself, but rather reflects the present software and hardware integration. Future improvements in quantum hardware accessibility, solver parallelization, and decoder optimization are expected to substantially reduce end-to-end generation times. Moreover, classical alternatives such as simulated or momentum annealing accelerated on graphics processing unit (GPU) may serve as practical complements when quantum hardware availability is constrained.57,58
An important advantage of quantum annealing observed in this study is its enhanced selectivity in constrained multi-objective optimization. While quantum annealing and simulated annealing achieve comparable levels of molecular validity and novelty, quantum annealing consistently yields a higher proportion of molecules that simultaneously satisfy multiple target property constraints. This behaviour is particularly valuable in molecular inverse design problems where feasible solutions occupy narrow and sparsely distributed regions of the search space. The solution-searching dynamics of quantum annealing, influenced by tunneling effects and thermodynamic noise, appear well suited to navigating such rugged optimization landscapes, thereby improving the efficiency of constraint satisfaction rather than merely increasing random exploration.
Overall, the proposed framework provides a complementary alternative to monolithic deep learning-based molecular generators. Rather than competing with likelihood-based models in terms of raw generation speed or syntactic validity, the present approach is designed to support scalable, property–agnostic inverse design under complex and evolving constraints. This capability is particularly relevant in early-stage molecular discovery, where exploration beyond known chemical scaffolds and rapid adaptation to new design objectives are essential. By modularizing molecular inverse design into representation learning, lightweight transfer learning, and optimization-based generation, the proposed framework offers a promising foundation for future hybrid approaches that combine machine learning and advanced optimization technologies.
A key advantage of CDDD is its reversibility, as molecular descriptors can be decoded back into valid SMILES strings using the corresponding pretrained decoder. This property is essential for the proposed framework, which relies on inverse optimization of molecular descriptors followed by molecule reconstruction. In contrast to task-specific or end-to-end trained generative models, the CDDD encoder–decoder is kept fixed throughout this study, and no retraining is performed. As a result, the learned chemical representation can be reused across different molecular property objectives, forming a stable foundation for feature-based transfer learning. All CDDD representations in this study were computed using the pretrained models released by Winter et al.32
Training data were collected from ZINC-22,33 PubChem,34 and ChEMBL.35 ZINC-22 (37
223
934
261 molecules), PubChem (115
347
688 molecules), and ChEMBL (2
399
743 molecules) datasets available as of 16 October 2023 were combined and preprocessed to meet the requirements for CDDD calculation.32
(1) Ionic bonds were cleaved and the largest molecular fragments were retained.
(2) Stereochemical information was removed, molecules were converted to canonical SMILES using RDKit,48 and duplicates were removed.
(3) Molecules containing atoms other than H, B, C, N, O, F, P, S, Cl, Br, and I were removed.
(4) Molecules with log
P outside the range of [−7, 5], molecular weight outside [12
600], or fewer than three heavy atoms were removed.
Of the initial 37
341
681
692 molecules, approximately 15.3 billion (40.9%) were removed during preprocessing, yielding a final dataset of 22
080
110
355 molecules, which includes all 72 million molecules used to train the CDDD autoencoder.32 The vast majority of the removal (approximately 14.6 billion, 39.0%) occurred in step 2, while steps 3 and 4 each removed comparatively small fractions of approximately 0.3% and 1.6%, respectively. It should be noted that the source databases used in this study are subject to known coverage biases. ZINC-22 primarily catalogues commercially available and make-on-demand compounds, covering only a small fraction of drug-like chemical space even at the scale of tens of billions of molecules.33 More generally, widely used molecular datasets have been shown to lack uniform coverage of biomolecular structures, and models trained on such datasets may have limited predictive power for underrepresented regions of chemical space.59 While no explicit bias mitigation techniques were applied in this study, the optimization-driven generation via quantum annealing enables exploration of out-of-distribution regions beyond the training data, partially alleviating the impact of these biases. Nevertheless, the CDDD latent space inevitably reflects the biases of its training data, and incorporating more diverse molecular sources remains a direction for future work.
From this dataset, 1 million molecules were used for training and 0.1 million molecules for testing the task-specific prediction heads using a scaffold-based split. A separate validation set was not employed, as the prediction heads were intentionally designed to have a closed-form solution with no hyperparameters to tune. This simplicity is not incidental but a deliberate design choice that enables the inverse design problem to be directly formulated as a QUBO problem, which is central to the proposed framework. The full dataset was used for evaluating the novelty of generated molecules.
All transfer learning procedures for adapting the task-specific prediction heads were performed using an NVIDIA A100 GPU (40 GB memory).
(1) Molecular descriptors in the CDDD space that satisfy the desired property constraints are obtained by solving the inverse optimization problem defined by the linear prediction heads using QA.
(2) The obtained molecular descriptors are decoded into SMILES strings using the pretrained CDDD decoder.
QA is a stochastic optimization paradigm inspired by simulated annealing (SA), which explores the ground state of an Ising or QUBO model by introducing quantum fluctuations.63 Previous studies have shown that QA can outperform SA on optimization landscapes characterized by sharp and deep local minima, owing to quantum tunneling effects.28 These properties make QA particularly suitable for inverse molecular design problems, where feasible solutions satisfying multiple constraints often occupy narrow and fragmented regions of the search space.
In this study, QA was accessed via QA hardware provided by D-Wave Systems, Inc. through an API interface.64 Current quantum annealers remain limited in terms of effective problem size due to hardware connectivity constraints. For example, although the Advantage_system6.4 supports 5612 physical qubits, problems involving dense second-order interactions can only be mapped to approximately 180 logical variables.65 This limitation arises because multiple physical qubits are required to represent a single logical variable and its interactions.66
To overcome this restriction, a hybrid binary quadratic model solver was employed (hybrid_binary_quadratic_model_version2p), which combines classical optimization with QA and can handle up to one million variables. This solver is conceptually related to qbsolv,67 where large optimization problems are decomposed into smaller subproblems that are iteratively optimized using QA and classical heuristics such as SA and Tabu search.68–70 While the hybrid solver introduces additional classical overhead compared to pure QA execution, it enables the practical application of quantum-assisted optimization to high-dimensional molecular inverse design problems.
Within this framework, molecule generation is reduced to an optimization problem over the CDDD space, where the objective function is defined by a set of linear prediction heads corresponding to the target molecular properties. Because QA-based solvers operate on binary variables, the inverse optimization problem is formulated as a QUBO problem. Optimization algorithms based on the Ising model, including QA and SA, require the objective function to be expressed in terms of binary variables, as the system energy (Hamiltonian) is defined using spin variables σi ∈ (−1, +1). An equivalent formulation is given by the QUBO model, in which the spin variables σi are replaced by binary variables qi ∈ (0, +1). The equivalence between the Ising and QUBO formulations has been well established for a wide range of NP-hard optimization problems.71,72 Accordingly, the inverse molecular design problem is represented using the QUBO formulation, which provides a natural and intuitive representation compatible with QA-based optimization.
Let d be the value of l desired molecular properties, W = (w1, w2, …, wl)T be the weight of l linear prediction heads corresponding to the desired molecular properties, b be the intercept of l linear prediction heads, and x ∈ [−1, +1] be the value of the m = 512 dimensional CDDD to be obtained and find CDDD that minimizes the following evaluation function.
![]() | (1) |
| c = b − d |
Although each element of CDDD is a continuous value within the range of −1 to +1, the QUBO model requires the use of binary variables q ∈ {0, +1}. Therefore, each element x ∈ [−1, +1] of the CDDD is approximated using n binary variables q as follows. In this study, n = 8 was employed to balance computational complexity and accuracy.
![]() | (2) |
From eqn (1) and (2), the evaluation function is expressed as follows:
![]() | (3) |
| f(y, z) = n(y − 1) + z |
As the additive constant in eqn (3) can be disregarded, the Hamiltonian for the inverse estimation problem of a molecule with the desired properties can be expressed using the upper triangular matrix Q as follows:
![]() | (4) |
| f(y, z) = Wye(z)2(1−z)modn |
Due to thermodynamic noise and environmental fluctuations, QA does not guarantee reproducibility of individual solutions, and the probability of reaching the ground state varies across runs.55 In this study, this stochasticity is intentionally exploited. Multiple annealing runs are performed under identical conditions, and each run yields a distinct molecular descriptor. Consequently, a single annealing run corresponds to the generation of one candidate molecule. In contrast, although SA can be made reproducible by fixing random seeds, no such constraints were imposed here in order to promote molecular diversity.
Code availability: The source code for this study is available at https://github.com/hijooguchi/qa_molecular_inverse_design with DOI: https://doi.org/10.5281/zenodo.20039825.
| This journal is © The Royal Society of Chemistry 2026 |