Open Access Article
A. N. M. Nafiz
Abeer
a,
Sanket
Jantre
b,
Nathan M.
Urban
b and
Byung-Jun
Yoon
*ab
aDepartment of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA. E-mail: bjyoon@tamu.edu
bApplied Mathematics Department, Computing and Data Sciences, Brookhaven National Laboratory, Upton, NY, USA
First published on 17th October 2025
In recent years, deep generative models have been successfully applied to various molecular design tasks, particularly in the life and materials sciences. One critical challenge for pre-trained generative molecular design (GMD) models is to fine-tune them to be better suited for downstream design tasks that aim at optimizing specific molecular properties. However, redesigning and training an existing effective generative model from scratch for each new design task are impractical. Furthermore, the black-box nature of typical downstream tasks that involve property prediction makes it nontrivial to optimize the generative model in a task-specific manner. In this work, we propose an uncertainty-guided fine-tuning strategy that can effectively enhance a pre-trained variational autoencoder (VAE) for GMD through performance feedback in an active learning setting. The strategy begins by quantifying the model uncertainty of the generative model using an efficient active subspace-based UQ (uncertainty quantification) scheme. Next, the decoder diversity within the characterized model uncertainty class is explored to expand the viable space of molecular generation. The low-dimensionality of the active subspace makes this exploration tractable using a black-box optimization scheme, which in turn enables us to identify and leverage a diverse set of high-performing models to generate enhanced molecules. Empirical results across six target molecular properties using multiple VAE-based generative models demonstrate that our uncertainty-guided fine-tuning strategy consistently leads to improved models that outperform the original pre-trained models.
Design, System, ApplicationVariational autoencoders (VAEs), a particular class of generative models widely applied in diverse molecular design tasks, learn a continuous latent representation of their input (molecules in this case) that is leveraged in the search for molecules with optimized properties. This work proposed a black-box optimization strategy for finding VAE model parameters that can outperform the pre-trained VAE parameters in constructing molecules with better properties from a set of latent points. Specifically, our strategy takes the latent points found by any latent space optimization approach and explores the uncertainty classes of VAEs through their low-dimensional active subspace to find the diverse models that improve the properties of the molecules corresponding to those latent points. Due to the model-agnostic nature, our approach can be applied in complement to any latent space optimization algorithm with VAEs to go beyond the pre-trained model's performance. This can assist computational designers to retain most of the pre-trained model's capability before adopting a new generative model for the application of interest. |
Fine-tuning a generative model for a quantity of interest in a molecular design task can be challenging, especially with limited data.18 We address this challenge by efficiently quantifying the model uncertainty by employing active subspace reduction of a generative model,19 which constructs a low-dimensional subspace of the generative model parameters capturing most of the variability in the model's output. Incorporating model uncertainty leads to diversity in VAE model parameters (specifically the decoder in our problem setting), which expands the space of viable molecules compared to the pre-trained model. First, we assume that optimization over the latent space of a pre-trained variational autoencoder (VAE) model yields a list of candidate designs. These candidates, which can result from multiple runs of some optimization procedure with different hyperparameters, are decoded to generate molecules that determine the model's downstream performance. For these candidates within latent space, we adapt the generative model in its low-dimensional active subspace to enhance its performance beyond that of the pre-trained model, i.e. to learn a better decoding strategy than the pre-trained model to obtain decoded molecules with better properties from the candidates. We achieve this through black-box optimization, guided by performance feedback from downstream tasks. This optimization tunes the distribution of active subspace parameters to generate diverse models that outperform the pre-trained model for those candidate latent points. The black-box nature of our optimization for improving model performance in downstream molecular design tasks simplifies its integration with existing optimization methods in latent space of VAE-based generative models.
To this end, our contributions are as follows:
• We explore the model uncertainty class of VAE-based generative models, effectively represented by their low-dimensional active subspace parameters, using black-box optimization algorithms: Bayesian optimization (BO) and REINFORCE. The proposed fine-tuning approach yields diverse high-performing models which improve the generative model's performance in downstream design tasks of interest.
• We demonstrate the effectiveness of our uncertainty-guided fine-tuning approach in leveraging model uncertainty to enhance downstream performance across six target molecular properties. Our method consistently improves design performance of multiple pre-trained VAE-based models through the proposed closed-loop optimization scheme.
• We empirically analyze the impact of the active subspace parameter-based optimization of the acquisition function in latent space Bayesian optimization for three high-dimensional optimization problems.
![]() | (1) |
![]() | (2) |
is the input space for which the latent space of the VAE is learned. f(·) represents a black-box function which quantifies the quantity of interest for sample x. Instead of the high-dimensional discrete input space, this reformulation searches the continuous latent space
learned by the VAE model. Decode(z, θ0,decoder) uses the learned parameter θ0,decoder to reconstruct a sample in
from its corresponding latent point z. Different optimization strategies can find the optimum latent point z. Under the Bayesian optimization framework, the process is as follows.
Starting from a collection of observations, i.e. {x(i), f(x(i))}ni=1, a surrogate model (e.g. Gaussian process) fsurrogate:
→
is learned to predict the objective value from the latent space representation. In this step, each sample x(i) is embedded to the corresponding latent point z(i) through the encoder with θ0,encoder. At each BO iteration, a new candidate z(n+1) ∈
is selected to query its objective value given by f(·). This selection is done by optimizing the acquisition function (e.g. expected improvement,21 upper confidence bound,22etc.), which is a function on z through the surrogate model fsurrogate.
![]() | (3) |
The objective in eqn (2) consists of two functions, i.e. the black-box function f and the decoder process parameterized with θdecoder. Existing studies on latent space optimization only optimize for the latent point using a fixed decoder, i.e. the pre-trained VAE model's decoder. In this work, we asked whether this optimization could benefit from performing optimization on the decoder parameters. Specifically, we looked at two directions:
• (RQ1) can we improve the decoding process of the pre-trained VAE to obtain better samples from the set of latent points found by any latent space optimization algorithm? Note that the latent points are obtained by solving eqn (2) with pre-trained model parameters. Hence, the goal here is to see whether we can do better than the pre-trained model by tuning its decoding process, i.e. optimizing the decoder parameters.
• (RQ2) can we find a better solution for eqn (1) by optimizing both the latent point z and decoder parameter θdecoder in f(Decode(z, θdecoder))? This direction focuses on incorporating the optimization of decoder parameters inside the latent space optimization iteration.
For the first direction, we focus on VAE-based generative models for molecular design with an arbitrary latent space optimization algorithm. Here, the decoding process is optimized after the completion of latent space optimization. For joint optimization of the latent point and decoder (RQ2), we consider optimizing the decoder within the latent space Bayesian optimization that is adopted to suggest new samples in each weighted retraining iteration (PG-LBO,24 a recently proposed variant of Tripp et al.23). Sections 4.1 and 4.2 provides a formal description of these two directions.
In some cases, it is not necessary to design a specific VAE architecture specifically designed for the problem. For example, Lopez et al.30 utilized the attention VAE model from Dollar et al.31 to generate SMILES strings of corrosion inhibitor candidates. Nevertheless, given the substantial effort and expertise required to design and train such models, we believe it is essential to maximize the utility of existing pre-trained architectures. Our work is complementary to other approaches, e.g., Paddy,32 which focus on improving optimization strategies in the latent space of these pre-trained models, enabling more effective downstream performance without the need for designing new architectures from scratch.
MacKay33 introduced the concept of data-dependent effective dimensionality of neural network parameter space in the Bayesian framework. The experiments of Maddox et al.34 demonstrated the existence of many directions within the neighborhood of trained neural network weights where predictions remain unchanged. Several studies35–40 utilized this concept to compress over-parameterized neural networks by pruning. Furthermore, this low dimensionality in parameter space enables scalable uncertainty quantification through various subspace inference techniques.41,42 In our work, we chose the active subspace approach42 over other methods since it allows learning the subspace without retraining or modifying the architecture of the pre-trained model.
Previous efforts to fine-tune GMD models have been limited to using small, select molecule sets that fit specific design criteria. For example, Blaschke and Bajorath18 fine-tuned the REINVENT model43 to improve its ability to recognize molecules with multi-target attributes via transfer learning, i.e. retraining the pre-trained model with a pool of multi-target molecules. In contrast, our approach adapts the generative model based on downstream task performance. This problem is conceptually similar to the work by Krupnik et al.,44 who updated the pre-trained generative model parameters to generate samples matching the observed data from a robotics task simulator.
θ0 where θ0 is the pre-trained model parameter. For the downstream task of interest –
, algorithm
(e.g. Bayesian optimization in the work of Gómez-Bombarelli et al.4 and Jin et al.6) is applied in conjunction with the pre-trained model
θ0 to look for candidate points within the latent space of
θ0 so that properties of the molecules corresponding to those candidates are optimized. Specifically, for a given pre-trained model (PTM), the algorithm
finds a set of candidate design points, Q = {z(i)}, which
θ0 decodes to generate corresponding molecules {x(i)}. The properties of these molecules define the quantity of interest (QoI) of the pre-trained model QoIPTM, e.g. average property value of top 10% molecules out of {x(i)}. For design goals involving a specific target value, QoI can be defined as the top 10% of the differences between a target property value and the properties of the molecules {x(i)}.
Our contention is that while the set Q may achieve the best QoI for the pre-trained model, the algorithm
can perform better if the VAE model is fine-tuned for task –
. However, fine-tuned models are not always available for the task at hand. In this work, we investigate whether we can tune a given pre-trained model so that the molecules generated from the set Q (found by
using
θ0) achieve a QoI better than QoIPTM. Here, the set Q contains the candidate latent points found by some optimization procedure in the latent space of the pre-trained VAE, and QoIPTM is some target property statistics over the associated molecules. Our goal is to bias the pre-trained model to produce molecules with better QoI for the same design points in Q.
To summarize our objective, as illustrated in Fig. 1, we assume a set of design points – Q has been found using a latent space optimization algorithm
applied to the pre-trained VAE model. These design points can be decoded by the model to reconstruct molecules. We aim to further optimize the model parameters to generate better molecules from Q than the pre-trained model does. Denoting the QoI of the pre-trained model as QoIPTM = ϕ(
θ0,Q), our goal is
![]() | (4) |
![]() | (5) |
for a simpler presentation of the next stage. Next, we attempt to find whether a better solution is available by optimizing the decoder parameters. The intuition is that the optimized decoder may decode
to a sample whose encoded latent point
shows a better acquisition score. Finally, the candidate latent point (eqn (9)) for query is selected from
and
based on their acquisition scores. Like in section 4.1, we also use the concept of active subspace to facilitate a black-box optimization for eqn (7).![]() | (6) |
![]() | (7) |
![]() | (8) |
![]() | (9) |
D following probability distribution p(θ), we can construct an uncentered covariance matrix of the gradients:
=
θ[(∇θfθ(x))(∇θfθ(x))T]. If
admits the eigendecomposition:
= VΛVT where V includes the eigenvectors and Λ = diag(λ1, …, λD) are the eigenvalues with λ1 ≥…λD ≥ 0. We then can extract k dimensional active subspace by partitioning V into [V1, V2] where V1 ∈
D×k and V2 ∈
D×k with k ≤ n ≪ D where n is the number of gradient samples to estimate the covariant matrix
. Accordingly, the active subspace is spanned by V1 corresponding to the largest k eigenvalues.
| θS = θS0 + Pω | (10) |
θ0. In this work, we considered the combination of the reconstruction loss and the KL divergence loss of the VAE models as the fθ(x) mentioned in section 5.1 while freezing the parameters in ΘD to θD0. Next, we apply the variational inference method45 to approximate the posterior distribution of ω, i.e. p(ω|
) using the training dataset of the pre-trained model. Specifically, we learn the active subspace posterior distribution parameters by minimizing the sum of the training loss of the VAE model and the KL divergence loss between the approximated posterior distribution and the prior distribution over the active subspace parameters. Details of constructing the active subspace and posterior approximation are given in section 6.2.
During the inference stage, we draw M samples {ωi}Mi=1 independently from approximated p(ω|
), and using eqn (10) we have an M number of model instances in parameter space Θ. Hence for the downstream task, we now have a diverse pool of models instead of a single pre-trained model. To quantify the uncertainty of the model's output, we can perform the Bayesian model averaging with this collection of models as in Abeer et al.19 Next, we use the distribution over active subspace parameters ω as the design space for finding a collection of models suitable for producing molecules with better QoI over set Q (RQ1) and finding a latent point with a better acquisition value (RQ2).
![]() | (11) |
θi}Mi=1, using eqn (10). The design points in Q are uniformly distributed among these M models for decoding. The property of interest is predicted for the reconstructed molecules and the predicted values are summarized, e.g. the average property value as QoI for the given distribution parameters.
(found by optimizing the acquisition function over the latent space) to a corresponding molecule. These decoded M molecules are further encoded using the pre-trained encoder network of the VAE, and their acquisition scores are computed. We select the maximum acquisition scores out of M encoded latent points and their acquisition score is the optimization objective, i.e. acq(Encode(Decode(
, θdecoder), θ0,encoder)).![]() | (12) |
, we have![]() | (13) |
In our work (RQ1), we consider VAE-based generative models for molecular design, including the JT-VAE,6 SELFIES-VAE,20 and SMILES-VAE.4 Given the design points Q in the latent space of the pre-trained VAE model suggested by some generic optimization algorithm
, decoders of the VAE models transform them into molecules. To obtain molecules with better properties for the same design points Q than the pre-trained model, we learn the active subspace of the decoders of the SELFIES-VAE and SMILES-VAE. For the JT-VAE, reconstruction of molecules involves two types of decoders. First, the tree decoder predicts a junction tree from a latent point. Conditioned on this predicted junction tree, the graph decoder constructs the molecular graph by selecting the best arrangement in each node of the junction tree. Out of these two components, the tree decoder plays the pivotal role in deciding the molecular structure as the junction tree contains all coarse information, i.e. which molecular units will be present in the constructed molecule. The graph decoder tracks the fine details of the interconnection between the nodes of the junction tree. Consequently, the tree decoder has broad control over the decision rules for constructing molecules from latent space. Therefore, we construct the active subspace over the JT-VAE's tree decoder, effectively allowing us to control the decision rules for constructing the junction tree from a latent point. Our optimization process attempts to change those rules, i.e. by changing decoder weights so that a latent point is decoded to a different junction tree yielding a better molecular graph than the pre-trained JT-VAE model.
In experiments for RQ2, we consider three high-dimensional optimization tasks where latent space Bayesian optimization is used with a weighted retraining framework. We take the active subspace of the decoders of the corresponding VAEs for all tasks except for Molecule where we consider the JT-VAE tree decoder.
| μpost − 3σpost ≤ μf ≤ μpost + 3σpost | (14) |
| 0.75σpost ≤ σf ≤ 1.25σpost | (15) |
We chose the 3σpost half-width around the posterior mean, μpost, to enable the fine-tuned distribution to navigate within the subspace region aligned with the posterior. The bounds for σf are set to avoid significant variance changes, as this could introduce excess noise in objective query evaluation, potentially hindering the optimization algorithm's performance.
With the above uncertainty guided design space, we impose the constraint in eqn (16) based on the KL divergence between the fine-tuned and posterior distributions. By setting threshold – δKL, we control how far from the inferred posterior we search for a better pool of models. If the design space is already very narrow, this constraint can be dropped for optimization.
| KL(p(ω; μf, σf)‖p(ω; μpost, σpost)) ≤ δKL | (16) |
vs.
in eqn (13)) by the acquisition score (RQ2), where the optimization goal is ϕacq (eqn (13)) instead of ϕ (eqn (11)).
are first initialized to the active subspace posterior distribution parameters, i.e.μpost and σpost, respectively. At each iteration of REINFORCE, we draw M samples from the current policy network p(ω; μf, σf) and compute the corresponding QoI, i.e. ϕ(μf, σf, Q) following the steps mentioned in section 6.1. Then we use the Adam optimizer49 with a learning rate α = 0.005 to update the policy parameters according to the following update rule:![]() | (17) |
drawing 1000 points in the pre-trained VAE's latent space according to
(0, I). To simulate QoIPTM, we convert this random collection Q to corresponding molecules using the pre-trained model and predict the property of interest for all unique molecules. QoIPTM is defined as the average property value of the top 10% samples among those unique molecules. For properties that need to be minimized, the top 10% samples are those with the lowest property values, and the sign of their average is altered to maximize the QoI.
For the posterior and fine-tuned distributions over AS parameters, we independently sample 10 models using the AS parameter distribution and divide the 1000 latent points of Q equally among them, giving each model 100 design points to decode. This ensures up to 1000 unique molecules for a fair comparison with the pre-trained model. We then use the same property predictor on unique molecules that we used for the pre-trained model and define the QoI as the average of the top 10% properties for the given distribution parameters. If the pre-trained model or any of the sampled models cannot decode a latent point into a valid molecule, we discard it in the QoI estimation.
P), synthetic accessibility score (SAS), natural product-likeness score (NP score),50 and inhibition probability against dopamine receptor D2 (DRD2),51 c-Jun N-terminal kinase-3 (JNK3) and glycogen synthase kinase-3 beta (GSK3β).13 We aim to maximize all properties except SAS, where lower values indicate easier synthesizability. Details of predictors for these six properties are provided in the SI along with the computed cost for evaluating QoI for 1000 latent points.
756
131 parameters) and σ0 = 0.01 for the SELFIES-VAE (4
419
177 parameters) and SMILES-VAE (4
281
894 parameters) decoders. Although the JT-VAE subspace showed a lower rank (see Fig. 3), its smallest singular value remained sufficiently large, so we retained 20 dimensions for our main experiments. Additional results with a 5-dimensional active subspace are provided in the SI.
We use the training dataset of the VAE model to perform variational inference to approximate the posterior distribution over active subspace parameters. We applied the Adam optimizer49 with a learning rate of 0.001 to find the approximate mean and standard deviation over 20-dimensional AS parameters by minimizing the combined loss of the VAE training loss (which includes the reconstruction loss and KL divergence of the VAE) with the KL divergence loss between approximated posterior distribution and prior distribution over AS. We use a multivariate normal distribution with a zero mean and 5 standard deviations as a prior distribution over AS parameters.
For each property, we ran 3 trials of BO and REINFORCE on 10 independently generated Q sets of design points (30 runs in total). Each boxplot in Fig. 2 shows QoI improvement over the pre-trained JT-VAE model. The results indicate that BO consistently outperforms the reward-based approach – REINFORCE in achieving larger QoI improvements across all six properties. Fig. S1 and S2 in the SI present analogous results for the SELFIES-VAE and SMILES-VAE, where REINFORCE matches BO. For a quantitative comparison, Table 1 reports the QoI values for both the pre-trained model and the fine-tuned distributions obtained by our approach, confirming our method's consistent gains over the pre-trained model in generating molecules with better properties. We have also performed the Wilcoxon signed-rank test52 with the alternative hypothesis that QoI values obtained by our approach of optimization over the active subspace distribution parameters are better than those of the pre-trained model (PTM). Specifically, we considered the paired QoI data, i.e. QoI for the pre-trained and finetuned models in the Wilcoxon signed-rank test. As shown in Table S2 of the SI, all cases showed statistically significant improvement over the pre-trained model, with the exception of SAS optimization using the JT-VAE model with the REINFORCE strategy.
| Models | Pre-trained/fine-tuned | Log P (↑) |
SAS (↓) | NP score (↑) | DRD2 (↑) | JNK3 (↑) | GSK3β (↑) |
|---|---|---|---|---|---|---|---|
| JT-VAE | PTM | 4.263 (0.084) | 2.034 (0.027) | 0.039 (0.037) | 0.039 (0.008) | 0.061 (0.004) | 0.135 (0.009) |
| PTM + BO | 4.367 (0.078) | 2.017 (0.029) | 0.076 (0.039) | 0.047 (0.009) | 0.065 (0.005) | 0.140 (0.008) | |
| PTM + R | 4.332 (0.093) | 2.031 (0.034) | 0.062 (0.042) | 0.042 (0.010) | 0.062 (0.004) | 0.138 (0.009) | |
| SELFIES-VAE | PTM | 4.741 (0.083) | 2.086 (0.043) | 0.722 (0.035) | 0.039 (0.008) | 0.065 (0.006) | 0.126 (0.007) |
| PTM + BO | 5.059 (0.043) | 2.010 (0.017) | 0.798 (0.028) | 0.064 (0.005) | 0.076 (0.002) | 0.146 (0.004) | |
| PTM + R | 5.044 (0.039) | 2.001 (0.018) | 0.801 (0.021) | 0.069 (0.009) | 0.076 (0.002) | 0.147 (0.004) | |
| SMILES-VAE | PTM | 4.869 (0.050) | 1.952 (0.024) | 0.194 (0.050) | 0.036 (0.009) | 0.067 (0.005) | 0.117 (0.011) |
| PTM + BO | 5.050 (0.042) | 1.903 (0.008) | 0.284 (0.022) | 0.059 (0.006) | 0.080 (0.004) | 0.137 (0.006) | |
| PTM + R | 5.069 (0.055) | 1.900 (0.010) | 0.289 (0.027) | 0.063 (0.004) | 0.081 (0.003) | 0.140 (0.005) |
Since BO outperforms REINFORCE in the JT-VAE model, we ran additional JT-VAE experiments to assess the effect of a noisy EI acquisition function, δKL sensitivity, the effect of active-subspace dimension k in BO, and the generalizability of the fine-tuned distribution's impact on the JT-VAE latent space (see SI section S7).
![]() | (18) |
![]() | ||
| Fig. 3 Comparison of subspace similarity between random subspaces and active subspaces for the JT-VAE tree decoder, SELFIES-VAE decoder, and SMILES-VAE decoder. Each entry of the normalized similarity is computed viaeqn (18) between subspaces generated using two different random seeds. For each VAE model, we constructed two (corresponding to two random seeds) 20-dimensional active subspaces – each using 100 gradient samples corresponding to randomly selected 100 molecules from the training dataset. Across the two seeds, we obtained two projection matrices P1 and P2 with Algorithm 1, starting from the pre-trained model weights. | ||
Since random subspaces are independently generated, there is no similarity between them (the subspace similarity measure is near 0). Active subspaces are also generated independently across two random seeds, but the learned projection matrices share a certain degree of intrinsic structure for the JT-VAE tree decoder and SELFIES-VAE decoder, where the first few projection vectors show significant similarity (values closer to 1). This indicates that the learned active subspace focuses on a specific model parameter space, even though it is created through random perturbation (Algorithm 1) around the pre-trained model parameters. Since we construct the active subspace by taking gradients around the pre-trained model parameters, this pattern of subspace similarity implies that the pre-trained model is located in such a loss landscape, where moving along certain directions can significantly impact the loss function. Such subspace similarity also means that our proposed active subspace optimization approach explores the model parameter space in 20 directions that are highly relevant to the molecular optimization task, rather than just randomly selected directions. In the JT-VAE's case, this similarity is particularly pronounced which may contribute to its higher QoI improvements using Bayesian optimization over REINFORCE. This intrinsic bias might be introduced due to the way the JT-VAE tree decoder reconstructs a molecule from its latent space. In contrast, the SMILES-VAE decoder's active subspaces show negligible similarity, despite the similar architecture to the SELFIES-VAE, except for different molecular representations. This difference suggests that the SMILES-VAE's pre-trained weights may reside in a very sharp loss surface. Perturbing the weights (for active subspace construction) in any direction can cause large loss changes, yielding subspaces akin to random ones. These observations highlight the need for robust input representations, as in the JT-VAE and SELFIES-VAE, to learn meaningful low-dimensional active subspaces. Furthermore, it may be possible to construct subspaces that favor QoI-optimal regions, offering a promising future direction.
P), fitting an arithmetic expression to a target expression, and producing a binary image with maximum similarity to a target topology. We have provided a brief description of these standard optimization tasks in the SI. For each of these three tasks (denoted as Molecule, Expression and Topology, respectively), we apply the PG-LBO weighted-retraining framework,24 performing 10 weighted retraining iterations and using latent-space Bayesian optimization to suggest 50 diverse samples at the end of each iteration.
We introduced our active subspace-based optimization (RQ2) within each BO iteration, keeping all other components of weighted retraining the same as in PG-LBO. Fig. 4 shows the top-1 sample's objective score and total unique queries out of 500 suggested samples over 10 weighted retraining iterations for 5 trials across three tasks, both with (‘w AS’) and without (‘w/o AS’) active subspace-based optimization of the acquisition function. In Fig. 4, each best objective score was found by the optimization procedure, i.e., 10 iterations of weighted retraining, and the number of unique queries represents the number of unique samples we needed to evaluate for the Bayesian optimization. Note that each weighted retraining iteration involves performing Bayesian optimization (with or without our active subspace) to generate 50 samples and retraining the VAE model with the weighted dataset of existing training samples updated with the suggested 50 samples. As detailed in section 4.2, under the active subspace-based optimization of the acquisition function, we accept the latent point
suggested by this decoder optimization only if its acquisition score exceeds that of
found by optimizing the acquisition function over the latent space. This potentially skips the samples suggested by regular BO (‘w/o AS’). For Molecule and Expression, this reduces unique queries without largely affecting the top-1 score. This boosts query efficiency, i.e., it limits the number of unique samples to be evaluated for their objective score, which can be crucial for expensive black-box objective functions. In the Topology task, weighted retraining using vanilla BO (without active subspace) and BO with active subspace ended up generating all 500 unique samples, resulting in a constant number of unique queries for all trials. This task involves generating a 40 × 40 binary image where two generated samples are identical when they match exactly in all pixels, and this has a very low probability of occurrence (21600 possible binary images). Hence, having a duplicate image from the weighted retraining process is a very rare event and this potentially explains the constant number of unique queries. This result highlights that BO with active subspace (RQ2) can improve the query efficiency (in terms of unique samples) in the weighted retraining framework, but the performance depends on the data space encoded by the VAE.
In our work, we have used the top 10% average as the reward (QoI), which can be a bottleneck for expensive oracles, e.g. wet-lab experiments, physics-based simulation, etc. Specifically, such a rank-based objective requires evaluation of all molecules in the current batch before taking the top 10% average of their properties, and this complexity increases with the size of the batch. However, one can utilize property predictors as a proxy for the expensive oracles to estimate the top 10% reward that can be used as the objective in our proposed active subspace-based optimization. For example, rank-based acquisition functions, e.g. qPO (multiple point probability of optimality) proposed by Fromer et al.,55 can be potentially applied to suggest the optimal batch (top 10%) for evaluation under resource-constrained scenarios.
It is important to note that the success of our proposed fine-tuning approach hinges on the quality of the pre-trained generative model, as we learn the active subspace posterior distribution parameters over which the design space is defined, by perturbing the pre-trained weights. If the pre-trained model fails to capture the underlying molecular generation rules, our active subspace-based method is unlikely to improve the design. In this direction, we explored (in RQ2) the potential of integrating our approach with iterative refinement methods23,24,54,56 to enhance the generative model's sampling efficiency. If surrogate models used for QoI feedback misrepresent the ground truth, this may misguide the fine-tuning method.57 To mitigate this, one can also leverage our black-box treatment to adopt a systematic, formal risk-aware optimization approach for robust fine-tuning under uncertainty.
Supplementary information is available. See DOI: https://doi.org/10.1039/d5me00081e.
| This journal is © The Royal Society of Chemistry 2026 |