Daniel
Rothchild
*a,
Andrew S.
Rosen
bc,
Eric
Taw
dc,
Connie
Robinson
e,
Joseph E.
Gonzalez
a and
Aditi S.
Krishnapriyan
*ad
aDepartment of Electrical Engineering and Computer Science, University of California, Berkeley, USA. E-mail: drothchild@berkeley.edu
bDepartment of Materials Science and Engineering, University of California, Berkeley, USA
cMaterials Science Division, Lawrence Berkeley National Laboratory, USA
dDepartment of Chemical and Biomolecular Engineering, University of California, Berkeley, USA. E-mail: aditik1@berkeley.edu
eDepartment of Chemistry, University of California, Berkeley, USA
First published on 22nd July 2024
We present an investigation of diffusion models for molecular generation, with the aim of better understanding how their predictions compare to the results of physics-based calculations. The investigation into these models is driven by their potential to significantly accelerate electronic structure calculations using machine learning, without requiring expensive first-principles datasets for training interatomic potentials. We find that the inference process of a popular diffusion model for de novo molecular generation is divided into an exploration phase, where the model chooses the atomic species, and a relaxation phase, where it adjusts the atomic coordinates to find a low-energy geometry. As training proceeds, we show that the model initially learns about the first-order structure of the potential energy surface, and then later learns about higher-order structure. We also find that the relaxation phase of the diffusion model can be re-purposed to sample the Boltzmann distribution over conformations and to carry out structure relaxations. For structure relaxations, the model finds geometries with ∼10× lower energy than those produced by a classical force field for small organic molecules. Initializing a density functional theory (DFT) relaxation at the diffusion-produced structures yields a >2× speedup to the DFT relaxation when compared to initializing at structures relaxed with a classical force field.
The vast majority of machine learning methods aimed at understanding the shape of the PES are learned interatomic potentials, and state-of-the-art interatomic potentials currently consist largely of neural-network interatomic potentials (NNIPs).1,2 NNIPs are trained through supervised learning, and the most common approach is to use a dataset of atomic geometries that are annotated with energies and forces. These energies and forces are typically derived from a more expensive physics-based method. The result is a machine learning model that can predict, given a particular geometry of atoms, the energy of that atomic configuration and the forces experienced by each atom.
While machine-learned interatomic potentials are powerful tools for understanding the shape of the PES, they suffer from the major limitation that they require a supervised dataset of atomic geometries that have been annotated with energies and forces. Relying on datasets based on first-principles physics-based calculations is challenging for several reasons:
Computational expense: it is extremely computationally expensive to create this sort of dataset. For example, the OC20 dataset includes 1.3 M relaxations, each of which was allowed up to 1728 core-hours of compute time.3
Lack of inclusion of all geometries: the space of off-equilibrium geometries is combinatorially large, and there is no definitive way to choose which geometries to include in the dataset. If important types of geometries are excluded, models trained on the dataset may generalize poorly.
Limited by the level of theory: because of how expensive it is to produce the dataset, we are limited in the level of theory that can be used for generating the energies and forces. For example, it is common to generate geometries using tight-binding calculations, which are significantly less accurate than density functional theory (DFT).4,5 Generating a labeled dataset of energies and forces based on experimental data is not feasible, so learned interatomic potentials will always be limited by the training set's level of theory.
To address these concerns, we investigate an alternative approach to understand the PES, which requires only ground-state (i.e., lowest energy) geometries as training data, and which does not involve explicitly learning an interatomic potential. Recently, several authors have trained non-NNIP machine learning models to generate 3D atomic geometries, either from scratch or based on a molecular graph, by training a self-supervised model on a dataset of known 3D geometries.6–10 In the self-supervised learning setting, models are trained on data without labels, which in this case means 3D atomic geometries that have not been annotated with energies or forces. With no labels to learn from, models are instead trained by corrupting the training data with noise and then asking the model to predict the original, de-noised data. The intended use of these models is to generate geometries (either from scratch or for conformer generation), rather than to reason more generally about the PES. However, they are undoubtedly learning about the PES, given that they are able to generate structures that lie near local energy minima. To our knowledge, there has not been a comprehensive investigation into the degree to which these models acquire insights about the PES in the vicinity of the local minima, as opposed to solely learning point estimates of where the local minima lie. Existing evaluations only look at the generated geometries themselves to measure their quality.
The reason these models are worth exploring in more detail is that they provide a path forward to accelerate electronic structure calculations, while avoiding the challenges of training NNIPs described above. Instead of training on a dataset of structures with their corresponding energies and forces, these models train in a self-supervised fashion on un-annotated ground-state geometries. Training in this manner offers a number of advantages:
Computational savings: it is computationally cheaper to generate a dataset of only ground state configurations than to generate a training dataset for NNIPs, since we can initialize calculations at a higher level of theory with the best guess from a lower level of theory. In contrast, NNIP training data must include many off-equilibrium structures at a high level of theory so that they learn to generalize beyond the immediate neighborhood of the ground state. For example, in order to ensure the dataset is not “biased toward structures with lower energies”, the Open Catalyst 2020 (OC20) dataset specifically includes DFT calculations on geometries “with higher forces and greater configurational diversity” than strictly necessary to identify the ground state.3
No need to select specific atomic geometries: there is no need to choose which off-equilibrium geometries should be included, since a self-supervised dataset includes only ground state configurations.
Can learn from experimental data: self-supervised methods offer a distinct advantage over conventional supervised learning with energy and forces in that, in principle, they can be trained on experimental data. Training ML models on experimentally measured geometries is challenging, and we do not pursue experimental structures in this manuscript, but we believe that even the possibility of leveraging experimental data for training is a major advantage of self-supervised methods.
In this manuscript, we choose one of these models trained with self-supervised methods—an E(3) Equivariant Diffusion Model (EDM)7—and we probe the model's understanding of the PES (1) by inspecting more closely the model's predictions when generating structures as intended, (2) by examining the predictions when applied to downstream applications that it was not trained on, and (3) by comparing the predictions with the results of physics-based calculations (Section 4). Following this analysis, we investigate a practical approach to use EDM to accelerate structure relaxations of atomic systems (Section 5). Our code is available at https://github.com/ASK-Berkeley/e3_diffusion_for_molecules, and trained models are available on Zenodo.11
Note that our goal is not to train a better interatomic potential; prior work has already investigated denoising as a way to improve supervised learning, including for NNIPs.12–14 We are also not proposing a new way to train self-supervised models on chemical systems, rather opting for an off-the-shelf EDM model. Instead, our objectives are: first, to understand, from a scientific standpoint, what these models are learning about the PES using only a denoising objective; and second, to propose a practical way to use models trained in this manner to accelerate electronic structure calculations.
To summarize, we make the following contributions. We undertake a study of a pretrained EDM model, finding that its inference procedure can be roughly divided into an “exploration” regime and a “relaxation” regime (Section 4.1). For small organic molecules, we demonstrate that the relaxation regime of the EDM model finds geometries with significantly lower energies than those found using a classical force field (Section 4.2). When “relaxing” structures, EDM's predictions for how to de-noise the atomic positions preferentially follow the forces to the ground state early in training, and preferentially move straight towards the ground state later in training (Section 4.3). We use EDM to sample from a molecule's Boltzmann distribution over conformations, establishing a correspondence between diffusion steps and temperature (Section 4.4). We re-purpose EDM to accelerate DFT structure relaxations, and we find that it can significantly speed up these calculations by proposing better initial geometries (Section 5.1). We attain a small speedup to structure relaxations on a dataset of larger drug-like molecules; for these more complex PESs, EDM's predictions align better with the ground-truth forces than with the direct path to the ground state (Section 5.2).
More recently, neural network interatomic potentials (NNIPs) have taken this trend to the next level, introducing models that predict the energy of atomic configurations using millions of parameters, which must be trained using techniques from machine learning. These NNIPs tend to be based on graph neural network architectures,20–23 and many NNIPs are equivariant, meaning that rotating the input geometry leads to a deterministic and easy-to-calculate transformation of the output.24–30 Energies predicted by an equivariant neural network are scalar quantities, so they are invariant to rotations of the input geometry. Predicted forces are vector quantities, subject to any rotation that is applied to the input geometry. See Geiger and Smidt31 for a more in-depth primer on equivariance in neural networks.
All of these approaches—from the two-parameter Lennard-Jones potential to the largest GemNet23 model with millions of parameters—follow the traditional paradigm of predicting energies and forces. These predicted energies and forces are then used to carry out structure relaxations, molecular dynamics simulations, Markov chain Monte Carlo simulations, etc. In contrast, in this work we propose learning about the potential energy landscape by training a machine learning model using a denoising objective on ground-state geometries, with no energy or force data in the training set.
Prior work has used denoising objectives in this domain for other purposes. Hoogeboom et al.,7 whose model we use in this work, trains an equivariant denoising diffusion model to generate molecules from scratch. Godwin et al.13 use a denoising objective as a regularization term for property prediction and one-shot structure relaxations. Zaidi et al.12 is most similar to this work, as they use a similar denoising objective to pretrain a graph neural network for molecular property prediction. They do not investigate the pretrained models themselves, focusing instead on using evaluating them only as starting points for fine-tuning a property prediction model. The success of their method motivates further study into what is learned via the denoising pretraining step.
To train a diffusion model to generate images, the model is repeatedly given images from the training set that have been corrupted with Gaussian noise, and it is tasked with predicting what noise was added. During training, the inputs to the model are corrupted with different amounts of noise. Sometimes, it faces images with only a small amount of noise, while at other times, the noise is so pronounced that the original image is nearly unrecognizable. Each time, the model is tasked with predicting what noise was added. The amount of noise added is controlled by the “diffusion step”, n, which ranges from n = 1 (low noise) to n = N (high noise), and the model takes n as an input, in addition to the corrupted image.† During training, every iteration samples a random diffusion step uniformly from n = 1 to N. Consequently, the model learns to make any image less noisy—whether it is already very noisy or only a bit noisy. To generate a new image during inference, the model is first given completely random Gaussian noise as the image, and n = N. The noise predicted by the model is subtracted from the image (after appropriate scaling), and the result is fed back into the model with n = N − 1. This process continues until reaching n = 1, at which point the original “image” has been completely denoised to obtain a purely generated image.
Recently, Hoogeboom et al.7 extended this technique to generating small organic molecules instead of natural images. Intuitively, their model works the same way as is described above. However, instead of generating images, the model generates molecules: the 3D coordinates of each atom, the chemical species of each atom, and the formal charge on each atom. The 3D coordinates are represented simply as the usual scalar-valued x, y, and z coordinates, and the atomic charges are also represented as scalars. To represent the chemical species, the authors assign each atom in the training set a vector, where all values in the vector are zero except the entry corresponding to the true atom type, which is set to one (i.e., a one-hot vector). At train time, the model receives these three quantities, each corrupted with Gaussian noise. At test time, the model iteratively denoises what was originally an entirely random input until a plausible molecule emerges. This process is depicted in Fig. 1. In order to respect translational symmetry, EDM translates all structures so that the center of mass is at the origin. In order to respect rotational symmetry, EDM uses an equivariant neural network called EGNN to predict what noise was added.33
We train the EDM model on Quantum Machines 9 (QM9), a dataset of small molecules with up to nine heavy atoms among carbon, nitrogen, oxygen, and fluorine,34,35 using the same hyperparameters used by Hoogeboom et al.7 We train the model for 6200 epochs and choose the model with the best validation loss, which occurs at epoch 5150. In comparison, Hoogeboom et al.7 train the same model for 1100 epochs. We find that the validation loss improves consistently until epoch ∼3000, after which it mostly levels off.
To match QM9, we carry out all DFT calculations at the B3LYP/6-31G(2df,p) level of theory.36–40 We use Psi4 (ref. 41) version 1.8, and relaxations are carried out with the Atomic Simulation Environment (ASE)42 version 3.22.1 using the BFGS algorithm for geometry optimizations with of 0.03 eV Å−1.
We first consider the plot of what fraction of molecules have all atomic species finalized (“Mol. Elem. Final”). Here, it is clear that the diffusion process can be divided into two regimes: an “exploration” regime, from diffusion step 1000 to ∼50, where the model is still figuring out the atomic identities, and a “relaxation” regime, from diffusion step ∼50 to 0, where the model is moving around the atoms while holding the atomic species fixed (these two regimes are shown schematically in Fig. 1). The transition in “Atom Elem. Final” from 0% finalized to 100% finalized is fairly abrupt, suggesting that the model decides on all the atomic species at once, instead of first deciding on, e.g., the carbon structure and then deliberating about which functional groups to add. Note also that the model finalizes geometries decidedly after choosing the atomic species: the fraction of molecules that have every atom's valence finalized (“Mol. BO Final”) doesn't increase at all until after almost all molecules' atomic species have been decided on (“Mol. Elem Final”).
Looking more closely at the relaxation regime, we calculate the energy of each structure along the diffusion path using DFT, starting with the first structure where all atom types are finalized. As seen in Fig. 3, the energy decreases during the relaxation regime fairly consistently, suggesting that the diffusion process is largely following the potential energy surface to the ground state rather than moving atoms through each other or taking a more erratic path. For reference, the DFT-computed energies of the geometries along a linearly interpolated path from the starting structure to the final structure are shown by the dashed lines in Fig. 3. We investigate this further in Section 4.3.
The MMFF-optimized structures are already reasonable approximations of the DFT ground state, but we seek to further improve them using EDM. To evaluate the diffusion-generated structures, we compare the DFT-computed energies of the diffused structures with that of the MMFF structures. Fig. 14 in Appendix A shows a random sample of the molecules used in this section.
The question remains what diffusion step n to start at when carrying out the relaxation. As shown in Fig. 2, the model consistently makes smaller/larger steps for earlier/later diffusion steps, so we need to choose n carefully: too small, and the model won't have enough steps to move the distance required to reach the ground state; too large, and the model will drastically re-arrange the molecule instead of finding the nearest local minimum. Most likely, we should use n < ∼50, since before this point, the model has not finalized atom species. As such, we try three values of n: 50, 30, and 20.
Some curves in Fig. 4 dip below zero energy relative to the DFT-computed ground state. We inspected the most negative such example by eye and found that the model's predicted geometry is the same conformer as the geometry in QM9, but with slightly different atomic coordinates. To save on computational cost, the ground states we used as the zero energy point were relaxed using a relatively permissive maximum force convergence criterion of 0.03 eV Å−1, so we suspect the negative energies are simply due to Psi4 not quite finding the global minimum during the relaxation.
However, noting that the diffusion process is inherently noisy, we also plot results when we consider the model prediction to be the sum of the subsequent k steps of diffusion. In other words, given k ≥ 1, f and gs remain unchanged, but instead of comparing these quantities to Δ, we compare to Δk, which is the sum of the next k Δ vectors. This idea is depicted schematically in Fig. 5. For the smallest value of k = 1, we expect the randomness in the diffusion steps to be most significant, so cosine similarities will likely be low. Higher values of k average out this noise, but raising k also artificially increases cosθΔk,gs, as compared to cos
θΔk,f, since if the model does eventually find a structure near the DFT ground state, regardless of which path it takes, Δk approaches gs as k increases. For this reason, we plot three values of k: k = 1, where the noise dominates, k = 30, where k is likely high enough that much of the similarity between Δk and gs can be attributed to this effect, and k = 10, which we hope strikes a good balance.
Considering the fully trained model (5150 epochs), the model predictions Δ are consistently more aligned with the direct path to the ground state than with the DFT-predicted forces. This holds true even for low k, suggesting that the model is finding a path to the ground state that is more efficient than following the forces gradient-descent style. There is reasonably high alignment between the model predictions and the forces, but this can be explained by the fact that the forces themselves are somewhat aligned with the direct path to the ground state (black dotted line in Fig. 6). cosθΔk,f is never substantially higher than cos
θf,gs, so any alignment between the model predictions and the forces can be explained by the alignment between the forces and the straight path to the ground state. Given that these results are on molecules that were unseen during training, the fact that the diffusion path aligns better with the direct path to the ground state than with the forces suggests that the model has learned about local curvature of the PES, rather than only learning about local gradients. Note that many non-learning-based relaxation methods also compensate for the curvature of the PES instead of strictly following the gradients down to a local minimum. For example, BFGS preconditions the gradients with second-order information in order to take a straighter path towards the minimum than would be achieved by following the gradients directly.
Surprisingly, the situation is nearly the reverse for the model that has only been trained for 50 epochs (left plot of Fig. 6). Here, particularly later on in the diffusion process, Δk aligns better with f than with gs, suggesting that the model is more so following the forces down to the ground state, rather than heading directly towards the ground state. When training any model, early on in training, the model first picks up on the easiest way to get the answer mostly right, and later on in training learns to recognize more nuanced aspects of the input to make more nuanced predictions. In this case, early on in training, the model moves the atoms preferentially in the same direction as the forces experienced by the atoms, despite having never seen any energy or force data during training. In other words, the model has discovered “following the forces” as the easiest way to find a low-energy geometry (at least to the extent that the solid lines are higher than the dashed lines in the left side of Fig. 6). In contrast, by the end of training, the model learns to compensate for the curvature of the PES and to move directly towards the ground state, but learning to do so takes significantly more epochs of training.
For very low T and n, we expect both the MCMC simulation at temperature T and the diffusion chain with step n to have an energy of ∼0 relative to the ground state. However, because there is sometimes disagreement between GFN2-xTB and the DFT-calculated ground states that the model was trained on, the diffusion chains at low n settle to an xTB-computed energy slightly above zero. To compensate for this, for each molecule we subtract a constant energy (<2 kcal mol−1) from each chain to equalize the minimum energies achieved by the two chains at the lowest values of T and n.
As a point of reference, when repeatedly perturbing atomic coordinates with an isotropic normal distribution instead of the diffusion model, the energy diverges, even for extremely small step sizes. Unlike repeated Gaussian perturbations, which have no preferred direction, the diffusion model preferentially moves the geometries closer to a local minimum of the PES, even without the Metropolis–Hastings acceptance criterion, which uses a ground-truth energy oracle. The model was trained only on geometries, with no energy supervision either on the ground-state geometries or on any non-equilibrium geometries. Despite this, we never observed a diffusion chain diverging, even for high values of n, and there is even reasonable agreement between the distributions of energies within the MCMC and diffusion chains.
Next we investigate the relationship between T and n as well as the average and variance of the resulting energy distributions. Fig. 8 plots, for a single molecule, the average energy ± the standard deviation of the energy for each of the n and T values we considered. As expected, the energy increases linearly with temperature. On the other hand, the energy increases quadratically with increasing n. This is unsurprising: near the end of inference, the step size decreases roughly linearly, as seen in Fig. 2, and near a local minimum, we expect the PES to be modeled well as a harmonic oscillator. The left side of Fig. 8 plots both chains with a linear scale on T and n. The right side of the figure instead uses a quadratic scale for n, and the x-axis is linearly scaled by a constant μ to equalize the slope between the two chains. Note that any linear and quadratic functions can be made to line up using this method, so their alignment in this plot is unsurprising. Fig. 9 plots the same quantities, but repeated for each of the nine molecules considered. In this case, we use the same linear scaling μ for each molecule, so there is no guarantee that the lines will all line up. Even though there is some variation in the heat capacities of the nine molecules (i.e. the slope of the MCMC lines), the diffusion chain consistently generates very similar average energies as the MCMC chain at the corresponding temperature.
![]() | ||
Fig. 9 Right-hand plot of Fig. 8 for the nine molecules considered. Only one linear scaling factor is used across all molecules. |
The variances of the distributions are also similar, though the diffusion chain consistently results in a wider distribution of energies than the MCMC chain. The energy distributions for these simple molecules from QM9 are unimodal, but it would be interesting to see whether the diffusion chain can reproduce multi-modal energy distributions for more complex molecules.
In addition to measuring the speedup obtained by starting at the diffusion-generated structure instead of the MMFF-relaxed structure, we also compare the energies of the final structures themselves after undergoing DFT relaxation. In most cases, DFT finds nearly identical structures, regardless of which of the two initial structures were used. However, in some cases, the two structures do differ, and we quantify this difference using the relative energy between the two structures. In the following sections, we refer to this relative energy as the “energy delta”. An energy delta greater than zero indicates that the DFT relaxation converged to a higher-energy structure when initialized at the MMFF-relaxed structure than at the diffusion-generated structure. Fig. 11 shows a schematic of the overall calculation workflow, with the structures used to calculate the energy delta circled in red.
![]() | ||
Fig. 11 Schematic showing how we compute DFT relaxations, and how we calculate the energy delta (Δ). |
The energy deltas are approximately zero in every structure but one, in which diffusion with n = 50 steps moved the geometry to a better local minimum than MMFF found, as indicated by the negative arrow of magnitude ∼1 kcal mol−1. This structure is also one of the few where the relaxation takes longer when starting from the diffusion-generated geometry. We visualize this particular structure in Appendix B, where it becomes clear that diffusion has rotated the hydrogen atoms of a methyl group just far enough around the central carbon to get relaxed by DFT to a more favorable position. Interestingly, when using only n = 20 steps on the same structure, diffusion does not stray from the PES basin found by the MMFF relaxation, which is unsurprising given the results in Fig. 4.
We also calculate the median speedup obtained using EDM checkpoints throughout training. Results are shown on the right of Fig. 10. The improvement slows down over the course of training, but there may still be further improvements possible by simply training longer. By the end of training, while in some cases the diffusion-generated structure leads to a slowdown rather than a speedup, even the 25th percentile speedup is nearly 2×. For 25% of structures, the speedup is at least 4× (75th percentile is at 75% speedup).
![]() | ||
Fig. 12 Speedup results for DRUGS. Figures are analogous to those in Fig. 10. |
Lastly, we carry out a similar analysis of how the steps taken by diffusion align with the ground-truth forces on the atoms, using xTB to calculate forces instead of DFT. For QM9 structures, early in training the model's predictions align better with the DFT-computed forces, and later in training they align better with the direct path to the ground state (Section 4.3). In the case of DRUGS, the model behaves similarly to how the QM9 model behaves early in training; the model's predictions are more aligned with the xTB-computed forces than with the path directly to the ground state. This is shown Fig. 13, where the solid lines tend to go higher than the dashed lines.
![]() | ||
Fig. 13 Force alignment plot, analogous to the plots in Fig. 6 but for the EDM model trained on DRUGS for 250 epochs, and using xTB to calculate forces instead of DFT. |
This result helps paint a picture of how these models improve throughout training. Early on, the QM9 model's predictions align better with the forces than with the direction to the ground state, and the structures produced by the model are only slightly better than those produced by MMFF: after 50 epochs of training, the QM9 model reduces energy relative to the DFT ground state by 2× compared to MMFF, and using the diffusion-produced structure yields no speedup to DFT relaxations whatsoever (after 100 epochs, the energy reduction is 3×, the median DFT speedup is 4%, and the force alignment plot looks similar to the plot for 50 epochs). Later on in training, the model's predictions trend towards pointing straight to the ground state instead of aligning with the forces, the energy improvement is close to 10×, and the median speedup to DFT relaxations is nearly 60%. The DRUGS results mimic the QM9 results early on in training: the model predictions align better with the forces than with the path to the ground state, the energy relative to the xTB ground state is reduced by a factor of ∼2 compared to MMFF, and there is only a few percent speedup to xTB relaxations. With a better model—whether through further training of this same EDM or after improving the model architecture—we might expect the model to behave more like the QM9 model behaves at the end of training.
In contrast, for more complex and larger GEOM-DRUGS molecules, the model's predictions align better with the ground-truth forces on the atoms, suggesting the model has not yet learned beyond first-order information about the PES. Correspondingly, geometries predicted by EDM are of lower quality, both in terms of the energy improvement over MMFF and in terms of the speedup in DFT relaxations when starting at the EDM-predicted structures.
Although we present some of our findings in terms of capabilities that the model has, we are not proposing that a self-supervised model could outperform state-of-the-art supervised NNIPs on tasks like structure relaxations or MCMC simulations. After all, the training data for the diffusion model used in this manuscript consists of only a single point on the PES for each molecule or conformation. NNIP training datasets, in contrast, contain many points on the PES for each molecule, all labeled with energies and forces. Rather, we investigate the model's capabilities as a way both to see how far it is possible to get with self-supervision alone, and to gain insight into what information these models are learning about the PES from a training set of only ground-state geometries.
We see a number of avenues for future work. One exciting direction is to explore new tasks that have traditionally required an interatomic potential but that could be carried out with a self-supervised model instead. For example, in Section 4.4, we establish a correspondence between diffusion steps and temperature; future work could make use of this correspondence to use EDM for replica exchange MCMC. In a similar vein, repeatedly applying EDM to a structure while progressively increasing the diffusion step could allow the model to accelerate reaction prediction calculations and/or transition state estimation. By starting at a known structure and progressively raising the diffusion step (i.e., temperature), the system should eventually start hopping to nearby basins in the potential energy landscape. Future work could also explore whether our findings generalize to different types of materials, such as crystals and glasses. Lastly, on the modeling side, future work could explore other existing denoising models, and could also develop new model architectures and training paradigms designed specifically to improve performance on, e.g., structure relaxations.
Footnotes |
† The diffusion step is usually referred to as t ∈ [1, T], but we reserve T for temperature and use n ∈ [1, N] for the diffusion step instead. |
‡ In particular, we use a simple filter based on the molecules' SMILES strings: we filter out SMILES containing “=” or “#”, SMILES containing numbers, and SMILES with more than 15 “(” (to bias towards more linear molecules). |
§ We tested on ten molecules, but we discarded one because GFN2-xTB disagrees strongly with DFT about where the local minimum of the PES is. |
¶ The equivalent of Fig. 4 for the DRUGS dataset is qualitatively similar, but the energies don't converge to as low of a value, and the average for the N = 50 line is higher than the other two due to a single outlier with very high energy (out of ∼500 samples). |
This journal is © The Royal Society of Chemistry 2024 |