Chemical design with GPU-based Ising machines †

Ising machines are hardware-assisted discrete optimizers that often outperform purely software-based optimization. They are implemented, e.g. , with superconducting qubits, ASICs or GPUs. In this paper, we show how Ising machines can be leveraged to gain e ﬃ ciency improvements in automatic molecule design. To this end, we construct a graph-based binary variational autoencoder to obtain discrete latent vectors, train a factorization machine as a surrogate model, and optimize it with an Ising machine. In comparison to Bayesian optimization in a continuous latent space, our method performed better in three benchmarking problems. Two types of Ising machines, a qubit-based D-Wave quantum annealer and GPU-based Fixstars Amplify, are compared and it is observed that the GPU-based one scales better and is more suitable for molecule generation. Our results show that GPU-based Ising machines have the potential to empower deep-learning-based materials design.


Introduction
Molecule design with deep learning has become an essential tool to navigate through the vast chemical space. [1][2][3][4] So far, automatic molecule design methods have been successfully applied to develop, e.g., DDR1 kinase prohibitors, 5 uorescent molecules, 6 electrets 7 and ligands for supramolecular gate modulation. 8 Among the large variety of deep learning models, latent space-based models such as variational autoencoders 9 and transformers 10 constitute a major category. In their pioneering study, Gómez-Bombarelli et al. 1 proposed the use of a variational autoencoder to create a latent space and generate molecules by applying Bayesian optimization 11 to the latent space. This approach has been extended in a number of ways to grammar VAE, 2 syntax directed VAE 3 and junction tree VAE (JTVAE). 4 These methods employ a continuous-valued vector as the latent representation to enable smooth navigation with gradient information. However, the tness functions in the latent space are rich in local minima and may not be optimized completely. 12 Although the optimization performance is hard to measure, we suspect that the performance of latent space-based models is affected adversely by local minima. A dilemma here is that increasing the latent space dimensionality leads to better expressivity but causes difficulty in optimization. If a powerful optimizer is available, it may be exploited to break out of this dilemma.
Ising machines are specialized hardware that can solve quadratic unconstrained binary optimization (QUBO). 13 QUBO with M bits is described as where h i and J ij are real-valued parameters. A D-Wave quantum annealer is a widely used Ising machine based on modulation of quantum uctuations of superconducting qubits. 14 Recently, Wilson et al. 15 tried to leverage the exceptional optimization ability of a D-Wave annealer by applying it to a discrete latent space created by a binary variational autoencoder. 16 Although they reported successful optimization of thermal emitter topologies and diffractive meta-gratings, the current scale of D-Wave annealers may not be sufficient to deal with more complex problems. Since a D-Wave annealer has sparsely connected qubits, it requires multiple qubits for a variable to solve QUBO with full connections. 14 The current generation D-Wave Advantage has more than 5000 qubits, but the number of variables is limited to 124 bits.
Highly efficient global optimization of QUBO can be achieved by various physical platforms other than superconducting qubits. Examples include an ASIC-based digital annealer, 17 coherent Ising machine 18 and quantum approximate optimization algorithm (QAOA) implemented on gate-based quantum computers. 19 Among them, we focus on a GPU-based Ising machine called a Fixstars Amplify Annealing Engine 20 (referred to as Amplify later on). It is based on simulated annealing and uses multi-level parallel processing on multiple GPUs to nd optimal solutions. Amplify relies on conventional semiconductor technologies, but can handle large scale problems up to 130 000 bits with full connection.
In this paper, we construct a discrete latent space-based molecule generation framework by combining an adaptation of JTVAE and Amplify. We evaluate the expressivity of our discrete latent space in comparison to the continuous counterpart and elucidate how many bits are necessary to accurately represent the tness functions and optimize them. It turns out that the number of required bits is around 300, which is beyond the limit of the D-Wave Advantage. Our approach was tested in three different benchmarking problems, and we observed that it outperformed the combination of continuous VAE and Bayesian optimization in three different benchmarking problems. Our results show that the combination of a discrete latent space and a powerful discrete optimizer is a viable approach to solve various molecular design problems.

Method
Our algorithm called bVAE-IM assumes an oracle that returns the property of a given molecule. In real-world materials design tasks, the oracle can correspond to synthesis experiments or numerical simulations such as density functional theory calculations. In addition, two kinds of training examples are given: n u unlabeled examples and n l labeled examples. An unlabeled example is a molecule without any properties, while a labeled example is the pair of a molecule and its property. In most cases, these examples are collected from public databases like ZINC 21 and PubChem. 22 Our algorithm repeatedly generates a molecule and calls the oracle to obtain its property. The efficiency of bVAE-IM is measured by the best property values among the molecules generated with a predetermined number of oracle calls.
The workow of bVAE-IM is detailed in Fig. 1. First, a binary junction tree VAE (bJTVAE) is trained with the unlabeled examples to construct a latent space. We obtain bJTVAE by modifying the junction tree VAE 4 with the Gumbel somax reparametrization 16 such that the latent representation is a bit vector. We selected junction tree VAE as our base model, because it can deal with graph representations of molecules without relying on string representations like SMILES. 23 As a result, we obtain a decoder that translates a bit vector to a molecular graph. Aer the latent space is obtained, our task of generating a molecule boils down to nding the best bit vector in the latent space. In a continuous latent space, a commonly used method is to employ a Gaussian process surrogate model and choose the best latent representation by optimizing an acquisition function. 1 In our case, we employ a factorization machine 24 as the surrogate model and the best bit vector is selected by optimizing the trained FM with an Ising machine. The functional form of FM is described as where b, h i and w ki are real-valued parameters. It is similar to QUBO eqn (1), but the weights matrix of quadratic terms is a low-rank matrix parameterized by w ki . In the following experiments, the rank K is set to eight as in the literature. 25 Using the pretrained encoder, our labeled examples are mapped to the pairs of bit vectors x i and corresponding properties y i , i.e., D = {x i ,y i }. A factorization machine (FM) is trained with D by least squares tting. The trained FM corresponds to QUBO that can be optimized with an Ising machine. The solution of the Ising machine is decoded into a molecule and its property is acquired with an oracle call. It is then added to the dataset D and the FM is retrained with the expanded dataset. This procedure is repeated until the predetermined number of molecules is generated. The algorithmic details are described in Algorithm S1 in the ESI. †

Expressivity of latent spaces
The reconstruction accuracy of VAEs is measured by how accurately input examples are reconstructed as output examples. To achieve high accuracy for a wide variety of molecules, the dimensionality of the latent space must be sufficiently high to ensure that the latent representation is expressive enough. We show that the binary latent space of bJTVAE has comparable expressivity in comparison to the continuous counterpart, JTVAE. To this end, 250 000 molecules were sampled from ZINC and divided into 220 000 training and 30 000 validation examples. Six bJTVAE models with different latent dimensionalities, d = 50, 100, 200, 300, 450, and 600, were trained, while the dimensionality of JTVAE was xed to 56 as in the literature. 4 The reconstruction accuracy is dened as the fraction of exactly reconstructed molecules in the validation examples. The validity of generated molecules by bJTVAE is always 100%, because the decoder of JTVAE is utilized. In Fig. 2, the average reconstruction accuracies over 10 runs are shown. The accuracy of bJTVAE grows monotonically up to d = 300 and starts to saturate aer d > 300. At d = 300, the accuracy of bJTVAE is roughly on par with that of JTVAE, indicating that the binary latent space is capable of encoding molecules without much information loss. It also implies that molecular generators need a high dimensional latent space beyond the limit of D-Wave annealers.

Optimization performance
We tested bVAE-IM with the following three computational oracles. (1) Penalized log P 2 : octanol-water partition coefficient (log P) penalized by the synthetic accessibility (SA) score and the number of rings that have more than 6 atoms. (2) Topological polar surface area (TPSA): total surface area of all polar atoms including their attached hydrogen atoms. (3) GSK3b + JNK3 + QED + SA: 26 score unifying the predicted inhibition levels against c-Jun N-terminal kinase 3 (JNK3) and glycogen synthase kinase-3 beta (GSK-3b), QED (drug-likeliness) and SA. The rst and second properties are computed with RDKit, 27 while the code provided at MolEvol 28 was used to compute the third.
As the unlabeled data, we employed n u = 250 000 randomly selected molecules from ZINC. The labeled data are created by adding the properties to n l = 10 000 randomly selected molecules from ZINC using the computational oracles. To investigate extrapolative performances, we intentionally limited our labeled data to those with poor properties: penalized log P[ −3,3], TPSA˛[0,100] and GSK3-b + JNK3 + QED + SA˛[0,0.5]. As a baseline method, we employ a standard approach called VAE-BO consisting of JTVAE and Gaussian process-based Bayesian optimization as implemented in Kusner et al. 2 Note that the Gaussian process is initially trained with the same set of labeled data and retrained whenever a molecule is generated. Additionally, we compared our method with random sampling in discrete and continuous latent spaces, which are denoted as bVAE-Random and VAE-Random, respectively. See Section 2 in the ESI † for details about training.
VAE-BO and bVAE-IM are applied up to the point that 300 molecules are generated. This is repeated ve times with different random seeds. bVAE-IM employed Amplify and the dimensionality of the latent space is set to 300. The property distributions of the generated molecules are shown in Fig. 3a. In all of the three cases, the top molecules of bVAE-IM were better than those of VAE-BO and the initial labeled data. See Table 1 for detailed statistics and Fig. S2 in the ESI † for examples of generated molecules. Note that the experimental results for three additional properties are shown in Section 4 in the ESI. † This result implies that, using a high performance discrete optimizer, it is possible to construct a competitive molecular generator. The performance improvement of bVAE- IM from bVAE-Random is larger than that of VAE-BO from VAE-Random. It may serve as evidence that the superiority of bVAE-IM results from better optimization, not from better expressivity of the latent space. However, the adversarial effect of local minima in the continuous latent space of VAE-BO is hard to measure precisely, because guaranteed global optimization in a high dimensional space is extremely hard.
The runtime for generating a molecule involves an oracle call, retraining of the surrogate model, optimization of the surrogate model, and decoding. Ising models are expected to accelerate the optimization part, but the impact on the total runtime is limited, because the other parts take substantially more time. Table 1 shows the total runtime for bVAE-IM and VAE-BO. BVAE-IM was more efficient, but the difference is less pronounced when the oracle takes more time (i.e., GSK3b + JNK3 + QED + SA). Fig. 3b shows the property distributions under different latent space dimensions. At 50 and 100 dimensions, the D-Wave quantum annealer (Advantage 4.1) was applied as well, but Amplify outperformed D-Wave in these cases. This is probably  Table 1 Statistics about top molecules generated by bVAE-IM, bVAE-Random, VAE-BO and VAE-Random. The performance of each method is characterized by the best property, the mean of the top 5% properties, and the mean of the top 10% properties. For each statistic, the mean and standard deviation over 5 runs are shown. As a reference, the corresponding statistics of the initial labeled data are shown. The total runtime is shown for bVAE-IM and VAE-BO  29 Although the GPU-based Ising machine is clearly a better choice at present, the situation might change as qubit technologies improve.

Discussion
The binary junction tree variational autoencoders developed in this paper are of independent interest in the chemoinformatics community, because the latent representation serves as an alternative ngerprint of molecules. See Section 1 of the ESI † for related discussions. Our method employs a factorization machine as the surrogate model, but alternative choices are possible. See Section 3 of the ESI † for details. Latent spaces provided by deep learning models have revolutionized how molecules are generated. Complex valence rules can now be learned from data and need not be implemented explicitly. Nevertheless, modeling molecular properties in the latent space and optimizing them are not straightforward. We have shown how molecule generators can be improved by powerful optimizers such as Ising machines. The development of deep learning methods is rapid, and the variational autoencoders employed in this paper may no longer be among the best methods. However, our approach can basically be applied to newer models with latent spaces such as transformers. 10 Current quantum-based optimizers have a scalability problem as pointed out in this paper. In addition, environmental noise oen prevents quantum-based optimizers from reaching the global minimum. Nevertheless, quickly developing technologies of Ising machines may solve these problems to the point that quantum-based ones are preferred over GPU-based methods.

Data availability
The code is available at https://github.com/tsudalab/bVAE-IM. The data to reproduce the results of this paper is available at https://zenodo.org/badge/latestdoi/608057945. As of March 2023, Fixstars Amplify is available via Python API free of charge.