Open Access Article
Shunya
Minami
*a,
Kouhei
Nakaji
bc,
Yohichi
Suzuki
a,
Alán
Aspuru-Guzik
cd and
Tadashi
Kadowaki
ae
aGlobal R&D Center for Business by Quantum-AI Technology, National Institute of Advanced Industrial Science and Technology, Ibaraki, Japan. E-mail: s-minami@aist.go.jp
bNVIDIA Corporation, 2788 San Tomas Expressway, Santa Clara, CA 95051, USA
cChemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Ontario, Canada
dVector Institute for Artificial Intelligence, Toronto, Ontario, Canada
eDENSO Corporation, Tokyo, Japan
First published on 21st July 2025
Quantum computing is entering a transformative phase with the emergence of logical quantum processors, which hold the potential to tackle complex problems beyond classical capabilities. While significant progress has been made, applying quantum algorithms to real-world problems remains challenging. Hybrid quantum-classical techniques have been explored to bridge this gap, but they often face limitations in expressiveness, trainability, or scalability. In this work, we introduce conditional Generative Quantum Eigensolver (conditional-GQE), a context-aware quantum circuit generator powered by an encoder–decoder transformer. Focusing on combinatorial optimization, we train our generator for solving problems with up to 10 qubits, exhibiting nearly perfect performance on new problems. By leveraging the high expressiveness and flexibility of classical generative models, along with an efficient preference-based training scheme, conditional-GQE provides a generalizable and scalable framework for quantum circuit generation. Our approach advances hybrid quantum-classical computing and contributes to accelerate the transition toward fault-tolerant quantum computing.
One widely studied methodology over the past decade is the variational quantum algorithm (VQA).11–14 Applications of VQA, such as quantum machine learning,15–18 often require uploading classical data into the circuit. The most common strategy is to embed the data into the rotation angles of gates in a parameterized quantum circuit (PQC). However, this approach faces limitations in expressibility, as the Fourier components of the expectation function are constrained to specific wave numbers.19–21 Moreover, embedding classical knowledge or inductive bias into the PQC structure remains challenging,22 despite the critical role of inductive bias in successful optimization.23 Addressing these limitations requires innovative strategies that lead to the next-generation hybrid quantum-classical computation.
This paper explores an alternative approach based on the recently proposed generative quantum eigensolver (GQE).24 GQE is a hybrid quantum-classical algorithm that uses a classical generative model to construct quantum circuits, where circuit components are sequentially generated from a predefined gate pool, similar to sentence generation in natural language processing. Unlike VQAs, no parameters are embedded in the quantum circuit; all parameters are contained within a classical generative model (see Fig. 1). These parameters are iteratively updated to minimize a particular objective. In a proof-of-concept experiment,24 the generative model is implemented using GPT-2 (ref. 25) architecture, referred to as the generative pre-trained transformer quantum eigensolver (GPT-QE), and its effectiveness is demonstrated in the ground state search of molecular Hamiltonians. A key feature of GQE is its ability to incorporate classical variables directly into the neural network, allowing for a non-trivial influence on the generated quantum circuits. Additionally, inductive biases can be seamlessly integrated, much like classical convolutional neural networks in computer vision26–28 and graph neural networks in materials informatics.29–31 While the potential of incorporating classical variables into the generative model has been previously discussed in the context of quantum chemistry,24 specific methods for its implementation have not yet been detailed.
Based on the concept of GQE, this paper introduces conditional-GQE (Fig. 1c), an input-dependent quantum circuit generation. To generate circuits from given inputs, we adopt an encoder–decoder transformer architecture,32 making the model applicable across different contexts. We apply this conditional-GQE approach to combinatorial optimization and develop a new hybrid quantum-classical method called Generative Quantum Combinatorial Optimization (GQCO). By incorporating a graph neural network33 into the encoder to capture the underlying graph structure of combinatorial optimization problems, our model is trained to generate quantum circuits to solve combinatorial optimization problems with up to 10 qubits, achieving about 99% accuracy on new test problems. Notably, for 10-qubit problems, the trained model finds the correct solution faster than brute-force methods and the quantum approximate optimization algorithm (QAOA).34
Many of the existing works for quantum circuit design35,36 often rely on labeled datasets, which limits their scalability as classical simulation becomes infeasible for large circuits. Although some recent approaches explore reinforcement learning for circuit optimization,37,38 they typically require computing intermediate quantum states to guide gate selection. Consequently, both these supervised and reinforcement learning methods become impractical for large-scale quantum systems where classical simulation of the quantum algorithm is not feasible. To address these challenges, we introduce a dataset-free, preference-based algorithm. Specifically, this work uses direct preference optimization (DPO)39 to update the circuit parameters by comparing the expected values of generated circuits. Unlike many supervised or reinforcement learning-based methods, our DPO-based strategy does not rely on prior labeling; it only requires the final measurement results of the generated circuits, thereby substantially reducing computational overhead.
As an illustrative demonstration of conditional-GQE, we focus on combinatorial optimization problems. However, the goal of this study is not to outperform existing state-of-the-art methods in combinatorial optimization. Indeed, a wide range of solvers have been developed, including traditional algorithms like simulated annealing (SA),40 machine learning-based approaches,41,42 quantum annealing,43 and techniques based on classical and quantum generative models.44 Rather than aiming to surpass these existing methods, our broader goal is to present a novel, scalable, and generalizable workflow for quantum circuit generation across diverse domains, which is accelerated with the help of high-performance computing systems.10 This work is expected to support practical quantum computation in the early fault-tolerant era and advance quantum technology's real-world application.
Given a fixed initial state |ϕini〉, GQE uses classical machine learning to generate a quantum circuit U that minimizes the expectation value
of an observable
. In many quantum computing applications, observables can be expressed as the function
of certain variables x, such as coefficients of the Ising Hamiltonian representing combinatorial optimization problems. However, similar to many VQAs, GPT-QE—the original demonstration of GQE—does not incorporate x into the generative model but instead uses a separate model for each context, as illustrated in Fig. 1a and b. We believe that incorporating contextual inputs into the generative model can yield significantly different results compared to previous algorithms. This study presents the context-aware GQE, which aims to train a generative model with contextual inputs, generating a circuit that minimizes the energy
in response to a given input x. In contrast to GPT-QE, which utilizes a decoder-only transformer, we employ a transformer architecture that includes both an encoder and a decoder. The details of GQE and our approach are provided in the Methods section.
In previous work by some of us,45 we suggested a way of training a parameterized quantum circuit U(θ, x) depending on the variables x. In this algorithm, the variables x are embedded into the circuit, and the parameters θ are optimized so that
is minimized for each x. However, embedding classical data into a parameterized quantum circuit faces the challenge of expressibility,19–21 meaning that the functional form of U(θ, x) for x is severely restricted. In contrast, in GQE, we are not restricted by these expressibility issues. The variables x are incorporated into the classical neural network alongside trainable parameters, and they affect non-trivially the generated quantum circuit.
Combinatorial optimization problems can always be mapped to a corresponding Ising Hamiltonian,46 which serves as an observable. We use the Hamiltonian coefficients as input to a generator, embedding them into graph structures with feature vectors that capture domain-specific information. A graph encoder with transformer convolution47 then produces an encoded representation. Using this, a transformer decoder generates a sequence of token indices that defines a quantum circuit. The solution is identified by selecting the bit sequence corresponding to the computational basis state with the highest observation probability from the generated quantum circuit. Fig. 2 presents the schematic of this process, with further details provided in the Methods section.
Circuit component pools must be predefined, allowing for the incorporation of domain knowledge and inductive bias. For example, since GPT-QE24 aims to search a ground state for a given molecule, the operator pool is composed of unitary coupled-cluster singles and doubles (UCCSD) ansatz48 derived from the target molecule. In this study, we use basic 1- and 2-qubit gates (Hadamard gate, rotation gates, and CNOT gate) and the QAOA-inspired RZZ rotational gate, i.e., an Ising-ZZ coupling gate acting on two target qubits. The target qubit(s) of each quantum gate and, if necessary, the control qubit, are available in all configurations, and there are six possible rotation angles of
for the rotation gates. By using basic gates rather than components suitable for many-body physics such as the UCCSD ansatz, this work aims to study whether we can train the model successfully without prior knowledge of an optimal or intuitively useful operator pool.
While a detailed description of our training strategy is provided in the Methods section, we summarize it here to highlight our scalable, broadly applicable framework. Scaling circuit size is critical for fault-tolerant quantum computing; however, most prior works35,36 rely on supervised learning methods that struggle to produce high-quality training data at large scales. In contrast, GPT-QE employs an alternative training approach called logit matching. This method does not require any pre-existing dataset; instead, it trains the generative model to approximate a Boltzmann distribution derived from the expectation value of a given Hamiltonian. In this work, to further increase the probability around the preferred circuits beyond what is computed by the Boltzmann distribution, we use a preference-based strategy called direct preference optimization (DPO).39 DPO compares candidate circuits based on their computed costs and updates the model parameters to increase the likelihood of the most favourable circuit. Crucially, it relies solely on expectation values from the generated circuits, eliminating the need for labelled datasets and therefore it facilitates the treatment of large-scale quantum systems. In other words, the model is trained by exploring the space of solutions rather than relying on previously-gathered ground truth. To manage the diversity arising from different problem sizes, we introduce a qubit-based mixture-of-experts (MoE) architecture.49–51 This module comprises specialized model sublayers called experts, and the model switches between layers depending on the number of qubits required. We further accelerate model training through curriculum learning,52 starting with smaller circuits and increasing task complexity step by step, then we proceed to fine-tune each expert for the respective problem size. Our preference-based curriculum training with MoE modules enhances the model's expressive power and scalability, facilitating the efficient integration of larger quantum circuits.
Fig. 3a compares the accuracy of GQCO with two other solvers—simulated annealing (SA)40 and the quantum approximate optimization algorithm (QAOA)34—on 1000 randomly generated combinatorial optimization problems for each problem size. Both training and test datasets were generated from the same distribution. For each test problem, the GQCO model generated 100 circuits (the same number as the number used during training), and the circuit yielding the lowest expected value was selected. SA and QAOA were initialized and performed independently for each problem; in particular, in QAOA, the circuit parameters were trained from scratch for each problem and the solution was determined based on the optimized circuit. SA was executed with 1000 sweeps and 100 reads per problem instance, while QAOA utilized circuits with four layers. Results for the other hyperparameter settings are provided in Fig. 3c, and detailed descriptions of the configurations are provided in the Methods section.
As shown in Fig. 3a, the GQCO model consistently achieved a high accuracy of approximately 99% across all problem sizes. In contrast, QAOA failed to exceed 90% accuracy even for a 3-qubit task, and its accuracy declined to about 30% for a 10-qubit task. This performance drop reflects the limited expressive power and trainability of the canonical QAOA approach. Achieving over 90% accuracy with QAOA would require a much deeper parameterized circuit, making, at the current time, stable training infeasible. In contrast, GQCO addresses these limitations by leveraging the high expressive power of classical neural networks and by employing a large number of parameters on the classical side of the computation. The performance gap observed here indicates the advantages of the generative quantum algorithm approach over variational algorithms.
Fig. 3b shows the time required for each method to reach 90% accuracy. To adjust runtime, we varied the hyperparameters—namely, the number of sampled circuits for GQCO, the number of sweeps for SA, and the number of iteration layers for QAOA. The total runtime includes all steps, from submitting a test problem to identifying the answer. For GQCO, this runtime encompasses both model inference and quantum simulation; for QAOA, it includes parameter optimization and quantum simulation. SA and brute-force calculations were performed on CPUs, while the other computations, including quantum simulation, were conducted on GPUs. The gray dashed line indicates the runtime of brute-force search, which grows exponentially with problem sizeİn contrast, the increase in GQCO's runtime was restrained. Although it took a certain amount of computation even for small problem sizes due to the need for transformer inference, GQCO surpassed the brute-force method when the problem size exceeded 10 qubits. In terms of computational complexity, the brute-force method for a problem size n requires a runtime on the order of O(2n). In contrast, GQCO's complexity depends on both transformer inference and the quantum computation of the generated circuit. The former depends on sequence length53 (i.e., circuit depth) and scales on the order of O(n2), while the latter can potentially benefit from exponential speedup on quantum devices. In other words, GQCO can be expected to provide polynomial acceleration compared to brute-force, though GQCO does not guarantee to reach 100% accuracy. It is important to note that, in this performance evaluation, the quantum computations were performed using a GPU-based simulator, so any speedup that could be gained from a quantum approach would not be present in any of these results. Nevertheless, a clear reduction in the growth rate of the runtime even for these classical simulations observed.
Fig. 3c illustrates the detailed relationship between runtime and accuracy when varying the hyperparameters for each method: the number of generated circuits for GQCO, the number of sweeps for SA, and the number of layers for QAOA. Generally, performance improved as execution time increased. However, for QAOA, increasing the number of layers did not consistently enhance the performance of the algorithm, especially with an increasing number of qubits. This behaviour is attributed to the training difficulties inherent in VQAs. In contrast, GQCO outperformed the other solvers, demonstrating greater performance gains as the execution time grew. This advantage arises from the processing power of GPUs, which enables additional samplings at little additional wall-clock cost, thereby boosting performance.
Because the runtime baseline depends on a device's computational power, the problem size at which the advantage emerges may differ across devices. However, the difference in computational complexity is independent of the device used.
Ideally, randomly generated Hamiltonians should cover the entire solution space for combinatorial optimization problems involving a given number of qubits. In practice, however, limited training iterations prevent complete coverage of the problem space, restricting the model's performance on structured problems. This limitation can be addressed by fine-tuning the pretrained GQCO model specifically for targeted structured problems. As illustrated by the green bar in Fig. 4, fine-tuning the pretrained GQCO model on 3-regular graph Max-Cut problems successfully improves accuracy to about 98%, nearly matching performance levels observed for dense graphs. This result highlights the importance of fine-tuning for practical applications with structured optimization tasks.
In this work, we did not impose an explicit restriction on the number of CNOT gates, although the maximum circuit depth for GQCO was set to twice the number of qubits. Certainly, the cost function and model structure are flexible enough to incorporate additional constraints on circuit depth or CNOT gate count. Further constraints that lead to shallower circuits could help generate circuits more robust to noise. Furthermore, the model can address device-related constraints. Because many quantum devices have restricted physical connectivity,58,59 compilation is often needed to map circuits onto the hardware. However, GQE-based quantum circuit generators can bypass this process by excluding gates that do not satisfy the device's physical constraints. This flexibility in generating hardware-efficient circuits is a key advantage of the GQE-based approach.
Fig. 6 shows a typical 3-qubit quantum circuit generated by our GQCO model. In this circuit, six gates are used to transform the initial state |000〉 to state
. Notably, the three successive RY(π/3) gates placed in the middle of the circuit are primarily responsible for obtaining the final state. The matrix representation of the composition of three RY(π/3) gates is given by
, which corresponds to a bit flip from |0〉 to |1〉 (or from |1〉 to −|0〉). The remaining three gates (one RZ gate and two RZZ gates) only change the global phase for the computational basis states and have no direct effect on the final solution. These observations suggest that GQCO differs substantially from quantum-oriented methods such as QAOA in that the GQCO model does not acquire a quantum mechanics-based logical reasoning capability. Instead, much like many classical machine learning models, GQCO appears to generate circuits by interpolating memorized instances. GQCO's circuit-generation ability relies on a data-driven approach rather than any logical understanding of quantum algorithms.
All of the circuits generated during the performance comparison (Fig. 3) are non-Clifford circuits and are generally expected to be difficult to simulate classically. However, as noted above, many of these circuits primarily perform bit flips. If we remove gates that affect only the global phase (e.g., the first RZZ gate in Fig. 6c), most GQCO-generated circuits become Clifford, allowing them to be classically simulable. Consequently, our findings do not demonstrate a quantum advantage58 or the quantum utility59 of GQE-based circuit generation. Nevertheless, even if the trained model produces circuits that can be classically simulated, non-Clifford circuits are still generated during training. In other words, the entire circuit space—including circuits that are computationally hard to simulate classically—must be explored to obtain the trained model, highlighting the benefits of incorporating quantum computation into the overall workflow. Moreover, for applications beyond combinatorial optimization, solutions often involve more complex quantum states, and the circuits generated by the trained model are expected to be classically unsimulable. Since our model can be trained without explicitly determining whether the generated circuits are classically simulable, the GQCO workflow applies equally well to problems that rely on superposition or entanglement.
These results do not imply that GQCO is inherently a classical heuristic method. Indeed, the original GQE24 has demonstrated strong performance on quantum tasks, such as the ground-state searches for molecules. In this study, the absence of quantumness in the generated circuits is likely because the combinatorial optimization problem itself is not intrinsically quantum. Thus, the model probably determined during training that quantumness was unnecessary for this type of task. The conditional-GQE workflow remains applicable to quantum problems, and it is still feasible to obtain quantum circuit generators exhibiting quantumness. However, training generators for such problems entails a more intricate cost landscape, making it challenging to train using simple gate pools or vanilla DPO loss, as done in this study. Future research focusing on more carefully designed workflows would therefore be promising, including the incorporation of quantum-specific metrics, such as entanglement entropy, into the loss function.
![]() | ||
| Fig. 7 Results on the real quantum processor. (a) Target Max-Cut problem with 10 variables. The edge weights are represented by line styles: dashed lines indicate a weight of 1, and solid lines indicate a weight of 2. (b) Quantum circuits generated by GQCO and a two-layer QAOA circuit for the target problem. (c) Sampling results on the real quantum device (IonQ Aria) for each of the circuits. Results for 1, 10, 100, and 1000 shots, as well as the state vectors computed by the simulator, are included. The histograms and plots are interpreted in the same manner as in Fig. 5c. Each plot is marked to indicate whether it leads to the correct solution. The enlarged figures are available in the ESI.† | ||
A key characteristic of GQCO-generated circuits is that resulting quantum state exhibits a distinct observation probability peak at a single computational basis state. In contrast, because QAOA discretely approximates time evolution from a uniform superposition, its resulting quantum state is more complex and less likely to yield a clear peak, particularly when the circuit depth is limited. Consequently, QAOA required more than 100 shots to identify the correct answer in this study, whereas GQCO was able to find it with just a single shot. This disparity in the number of required shots is expected to grow as the number of qubits increase.
Another notable aspect of GQCO becomes evident in cases where the ground state is degenerate. In principle, our GQCO model cannot account for degeneracy because the training process relies solely on circuit sampling, focusing on identifying a solution without considering the underlying quantum mechanics of the input Ising Hamiltonian. In a Max-Cut problem, the system is inherently degenerate, particularly the target problem in this section is doubly degenerate. As illustrated in Fig. 7c, while GQCO identified only one of the two solution candidates, QAOA exhibited observation probability peaks for both degenerated ground states. Originating in adiabatic quantum computation, QAOA is theoretically expected to yield a non-trivial probability distribution over degenerate ground states. GQCO's inability to capture degeneracy will require future work on model architectures and training approaches. Possible directions include directly incorporating degeneracy-aware constraints into the loss function, embedding symmetries of problems into circuit architectures, or initializing circuits with uniform superpositions.
The conceptual workflow described in this study can be extended beyond combinatorial optimization to any problem formulated as observable expectation value minimization. For example, in molecular ground-state searches, representing molecular structures as graphs allows direct adoption of our graph-based encoding. By replacing the encoder, the GQE-based approach also generalizes to quantum machine learning and partial differential equation solvers. We thus view GQE-based quantum circuit generation as a next step following VQAs.
However, several limitations remain as open problems. A major obstacle is the significant classical computational resources required to achieve scalability. While our findings indicate the computational advantage over brute-force solvers and QAOA, fully realizing this advantage demands extensive classical training beforehand. Fig. 8 illustrates the average computational time required for quantum circuit simulations during one epoch of GQCO model training, as well as the proportion of this simulation time relative to the average total computational time per epoch. The comparison spans various number of qubits and contrasts CPU and GPU simulation performance. A significant portion of the GQCO model's training time was consumed by quantum circuit simulations, and this computational cost increases exponentially with the number of qubits. Utilizing high-performance computing resources, such as GPUs, can help mitigate this rapid growth in computational cost.10 However, classical simulations become impractical for quantum circuits exceeding approximately 50 qubits, rendering conditional-GQE model training infeasible in such scenarios. Direct integration of quantum computations into the training process thus becomes necessary in these regimes, although this inevitably introduces new challenges, including training instability arising from sampling randomness and a substantial increase in the number of shots required.
Furthermore, enhancing the efficiency of generator training and reducing the required number of training epochs are essential objectives. This can be achieved by refining training strategies, such as designing encoder architectures informed by domain knowledge and developing effective pre-training methods. Additionally, careful gate pool design will play a crucial role. Machine learning-based approaches for identifying suitable gates or gate representation learning may offer a promising direction.
This research provides a novel pathway for quantum computation by leveraging large-scale machine learning models. It underscores the growing role of AI in the advancement of next-generation quantum computing research activities. We believe that our work will serve as a catalyst for accelerating the development of quantum applications across diverse domains and facilitating the democratization of quantum technology.
. These components collectively form a quantum circuit U = UtN⋯Ut1. Let pθ(U) denote the generative model of quantum circuits, where pθ(U) is a probability distribution over unitary operators U, and θ is the set of optimizable parameters. In GQE, the parameters θ are iteratively optimized so that circuits sampled from pθ(U) are more likely to minimize the expectation value of an observable:
is an observable and |ϕini〉 is a fixed input state. In particular, for an n qubit system, we use |ϕini〉 = |0〉⊗n. The quantum computation is involved in the estimation of
. Notably, unlike in VQAs, all optimizable parameters are embedded in the classical generative model pθ(U) rather than in the quantum circuit itself (see Fig. 1b).
As discussed in the Result section, the observable can be expressed as a function of certain variables x. Let us denote such an observable as
. The quantum circuit U that minimizes
also depends on the variable x. In original GQE approach (including GPT-QE), parameters are set and optimized for each specific target problem, much like in VQAs. More precisely, GPT-QE aims to obtain a decoder-only transformer pθ*(x)(U) for each x, where θ*(x) is the solution for the following minimization problem:
![]() | (1) |
denotes the expectation value of f(X) with respect to the random variable X over the sample space Ω, where X is drawn from the distribution p(X), i.e.,
.
By utilizing x as context (i.e., input), conditional-GQE aims to train a generative model pθ(U|x) that generates circuits minimizing
. The function pθ(U|x) provides the conditional probability of generating the unitary operator U when the input x is given. In transformer-based generative models, the probability pθ(U|x) is expressed as follows:
![]() | (2) |
Solving the optimization problems in eqn (1) or eqn (2) is challenging and thus requires surrogate objective functions. GPT-QE employs a logit-matching approach, whereas this study utilizes DPO39 loss. Further details of DPO are provided in the subsequent section.
The feature vector is then constructed using the following three elements: (1) the weights themselves, (2) the sign of the magnitude relationships between the weights of adjacent nodes or edges, and (3) the sign of the product of the weights of adjacent nodes or edges (see Fig. 2). More formally, the node feature vi and edge feature eij are computed as follows:
denote the index set of nodes connected to node i. These handcrafted features serve to incorporate domain knowledge of the Ising model into our model; specifically, the facts that spin–spin interactions with large coefficients or strong external magnetic fields have a significant impact on the spin configuration of the system, and that frustration—the absence of a spin configuration that simultaneously minimizes all interaction energies—generates a complex energy landscape.
For these reasons, we employ a preference-based approach using direct preference optimization (DPO).39 DPO is a training strategy derived in the field of reinforcement learning from human feedback (RLHF),70,71 used to fine-tune LLMs for generating preferred outputs. In this approach, multiple outputs are sampled and parameters are updated to increase the likelihood of preferred outputs while decreasing that of less preferred ones. Typically, in LLMs, human evaluators determine the preference of outputs. In our study, we assess the preferrability of circuits using the computed expectation values of the Hamiltonian.
Fig. 9a shows the schematics of our DPO-based training process. The expected DPO loss function used in this work is defined by:
is the set of sampled circuits, and
indicates that
, i.e., the circuit U(w) is preferred over the circuit U(
). πref(·|x) is a reference probability and serves as the baseline for the optimization during the DPO, and β is a hyperparameter controlling the influence of πref. In this work, we use
. Since the negative log-sigmoid function log(1 + exp(−x)) is monotonically decreasing, the function L is minimized when pθ(U(w)|x) is maximized and pθ(U(
)|x) is minimized. In other words, this loss function is designed to increase the generation probability of preferred circuits while decrease the probability of non-preferred circuits.
Ideally, the function L should be computed for all possible pairs of the M sampled circuits, totaling M(M − 1)/2 combinations. However, to reduce computational overhead, we employ the best-vs-others empirical loss, defined as follows:
is an input dataset with size
.
However, if all M sampled circuits are identical, the gradient of the loss will be zero regardless of the magnitude of the expected values, preventing the model from being trained effectively. To mitigate this issue, we employ contrastive preference optimization (CPO),72 an improved version of DPO. In addition to the DPO loss, CPO introduces a negative log-likelihood term on the probability of generating the most preferred output. In summary, the loss function used in this work is given by:
![]() | (3) |
), computing the gradient based on the loss function eqn (3), and iteratively updating the parameters. See the ESI† for detailed training settings, including the values of M.
Training begins with randomly generated combinatorial optimization problems involving 3 qubits. Performance is monitored regularly, and training continues until the model achieves an accuracy exceeding 90% on randomly generated test problems. Once this threshold is met, size-4 optimization problems are introduced as training candidates, along with the integration of a new expert module within the MoE layers. Then, performance is continuously monitored, and the maximum problem size in the training dataset is gradually increased.
Even when the maximum problem size is nmax, problems involving fewer qubits (<nmax) are still generated as part of the training data. The probability of generating a problem of size n when the current maximum size is nmax is defined as:
![]() | (4) |
For performance evaluation, the model inferences and quantum circuit simulations were both performed on an NVIDIA RTX A6000 GPU. All other classical computations were performed on an Intel Core i9-14900K CPU. For the real quantum device, we used IonQ Aria via Amazon Braket.
Footnote |
| † Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d5dd00138b |
| This journal is © The Royal Society of Chemistry 2025 |