Tianze
Zheng
*a,
Ailun
Wang
b,
Xu
Han
a,
Yu
Xia
a,
Xingyuan
Xu
a,
Jiawei
Zhan‡
b,
Yu
Liu
b,
Yang
Chen
a,
Zhi
Wang
b,
Xiaojie
Wu
b,
Sheng
Gong
b and
Wen
Yan
*b
aByteDance Research, Beijing, Beijing, 100098, China. E-mail: zhengtianze@bytedance.com
bByteDance Research, Bellevue, Washington 98004, USA. E-mail: wen.yan@bytedance.com
First published on 31st December 2024
A force field is a critical component in molecular dynamics simulations for computational drug discovery. It must achieve high accuracy within the constraints of molecular mechanics' (MM) limited functional forms, which offers high computational efficiency. With the rapid expansion of synthetically accessible chemical space, traditional look-up table approaches face significant challenges. In this study, we address this issue using a modern data-driven approach, developing ByteFF, an Amber-compatible force field for drug-like molecules. To create ByteFF, we generated an expansive and highly diverse molecular dataset at the B3LYP-D3(BJ)/DZVP level of theory. This dataset includes 2.4 million optimized molecular fragment geometries with analytical Hessian matrices, along with 3.2 million torsion profiles. We then trained an edge-augmented, symmetry-preserving molecular graph neural network (GNN) on this dataset, employing a carefully optimized training strategy. Our model predicts all bonded and non-bonded MM force field parameters for drug-like molecules simultaneously across a broad chemical space. ByteFF demonstrates state-of-the-art performance on various benchmark datasets, excelling in predicting relaxed geometries, torsional energy profiles, and conformational energies and forces. Its exceptional accuracy and expansive chemical space coverage make ByteFF a valuable tool for multiple stages of computational drug discovery.
Generally, force fields can be classified into two main categories: conventional molecular mechanics force fields (MMFFs),9–12,18,19 which parameterize a fixed analytical form to approximate the energy landscape, and machine learning force fields (MLFFs),20–23 that aim to map the atomistic and molecular features and coordinates to the PES using neural networks without being limited by the fixed functional forms. Most conventional MMFFs, such as Amber,24,25 GAFF9 and OPLS,11,12,26 describe the molecular PES by decomposing it into various degrees of freedom, including bonded (i.e., bonds, angles, and torsions) and non-bonded interactions (i.e., electrostatics and dispersion). These conventional MMFFs benefit from the computational efficiency of these terms, while suffering inaccuracies due to the inherent approximation, especially when non-pairwise additivity of non-bonded interactions are of significant importance. Emerging recently, MLFFs have shown great promise for modeling PES due to their ability to capture subtle interactions and complex behaviors that may be overlooked or oversimplified by classical models.20–23 Despite their outstanding accuracies, several drawbacks limit their applications in drug discovery. Owing to the complexity of the machine learning models involved, the computational efficiency of MLFFs is relatively low. Meanwhile, the amount of data required to train an effective MLFF is extremely large, which imposes constraints on their ability to cover the chemical space comprehensively. Consequently, conventional MMFFs remain the most reliable and commonly used tool for MD simulations involving biological systems to this day.7,8
In the past few years, several efforts have been made to improve the quality of MMFFs for predicting the PES of small molecules. Following the traditional look-up table approach, OPLS3e increased the number of torsion types to 146669 to enhance accuracy and expand chemical space coverage.11 Furthermore, OPLS3e and its successor OPLS4 were empowered with FFBuilder11 to refine the torsion terms for molecules beyond the coverage of the pre-determined torsion list. OpenFF18,19,27 took a different approach by utilizing SMIRKS patterns to describe the chemical environment of both bonded and Lennard-Jones terms. However, these discrete descriptions of the chemical environment have inherent limitations that hamper the transferability and scalability of these force fields. In the past decade, ML techniques have been employed to improve the parameterization process of molecular force fields, including genetic algorithms for parameterizing polarizable force field,28 neural network potentials for improving dihedral angle potentials29 and Bayesian optimization for coarse-grained force fields.30 Such early attempts provide excellent proof-of-concepts of the potential of ML methods for improving the development of force fields. Building on this, Espaloma31,32 introduced a novel end-to-end workflow where the MMFF parameters are predicted by graph neural networks (GNN), opening new avenues for the advancement of MMFFs.33 Despite promising results, these early attempts are constrained by the ML techniques and the training data, where significant improvements can be achieved.
In this work, we propose ByteFF, a data-driven MMFF trained on a large-scale, high-diversity, and high-quality quantum mechanics (QM) dataset with sophisticated ML techniques. ByteFF is designed to leverage both atom and bond features using a state-of-the-art GNN model, while preserving molecular symmetry. To construct the dataset for training ByteFF, we employ novel fragmentation methods and a rigorous QM calculation workflow, generating 2.4 million optimized molecular fragment geometries with Hessian matrices, along with 3.2 million torsion profiles. Additionally, we introduce a differentiable partial Hessian loss and an iterative optimization-and-training procedure to effectively train ByteFF on the dataset. Finally, we demonstrate the performance of ByteFF on various benchmarks, showing its expansive chemical space coverage and exceptional accuracy on intra-molecular conformational PES.
EMM = EMMbonded + EMMnon-bonded | (1) |
![]() | (2) |
Force field parameters should adhere to several physical constraints: (1) force field parameters should be permutationally invariant, e.g., the force constant of bond (i, j) should be equal to that of bond (j, i). (2) Force field parameters should be in accordance with chemical symmetries of molecules, e.g., force constants of the two C–O bonds in a carboxyl group (–C([O–])O) must be equal to each other, due to their chemical equivalency, even though they may have different bond orders assigned when written as a SMILES or SMARTS string.34 (3) Charge conservation should be guaranteed, i.e., the summation of assigned partial charges of atoms in one molecule should be consistent with the molecule's net charge, which avoids net charge gain/loss for the molecule in the parameter determination. These constraints are naturally satisfied in the traditional look-up table approaches, which should also serve as essential guidelines when force field parameters are inferred by a machine-learning model.
Additionally, there are two key philosophies worth considering when building general small molecule MMFFs: (1) force field parameters should be dominated by local structures, so that the parameters trained from small molecules can be consistently transferred to similar structures in relatively large molecules.9,11 (2) Torsional energy profiles should be accurately captured, as the quality of torsion parameters significantly affects the conformational distribution of small molecules, thereby influencing the prediction of properties such as protein-ligand binding affinity.11,35,36
The optimization dataset was generated from the entire 2.4 million fragments. The 3D conformations of each fragment was initially generated by RDKit from its SMILES string, and then optimized using the geomeTRIC43 optimizer at the chosen QM level. The relaxed structure is verified where no accidental bond breaking or formation happened during the structural relaxation. Finally, the Hessian matrix is calculated for each fragment using Q-Chem 6.1.44 The hessian matrix is further verified by checking all the eigenvalues (except for six near-zero values corresponding to translational and rotational modes) to ensure that a true local minimum is captured. More details about the workflow and the filtering criteria are provided in the ESI.†
The torsion dataset consists of two subsets, non-ring torsions and in-ring torsions, curated separately, comprising 2.2 million and 1.0 million frames respectively. For each non-ring torsion fragment, we first rotate the torsion angle in 15° increments from the optimized conformation, creating 24 initial frames that were then separately relaxed using the geomeTRIC optimizer with the rotated torsion angle constrained. While the in-ring torsions were scanned using a sequential frame-by-frame approach, with an early stopping strategy if the conformational energy is more than 20 kcal mol−1 higher than the unconstrained relaxed conformation, since those high-energy regimes are irrelevant in room-temperature molecular dynamics simulations. For both non-ring and in-ring subsets, the resultant conformers were also carefully filtered by the bond breaking or formation criteria. Due to the high computational cost of torsion scan, these calculations were performed with GPU4PySCF45,46 for significant acceleration and cost reduction. The results were also verified to be consistent with Q-Chem.
![]() | ||
Fig. 1 Model structure of ByteFF. ByteFF predicts MMFF parameters in three steps. First, atom and bond features are extracted from the molecular graph and then projected into embeddings. Then, an edge-augmented graph transformer (EGT)47 is used to synergize the edge embeddings with the node-based attention mechanism. Lastly, the output module derives force field parameters while preserving the molecular symmetry and total charge. |
Additionally, an ensemble of five models randomly initialized is trained for improved predictive performance and uncertainty quantification.
In the pre-training stage, the optimization dataset were used as the training set. In this stage, the non-bonded parameters were fitted to GAFF-2.2 vdW parameters and AM1-BCC charges with the mean squared error (MSE) losses. Force constants of proper torsions were also fitted to the GAFF-2.2 values with MSE losses, and the equilibrium values of bonds and angles were refined with energy-based loss functions. Meanwhile, the force constants of bonded terms in eqn (2) were trained using a partial Hessian loss, evaluated by the mean absolute percentage error (MAPE) of Hessian blocks corresponding to bonds, angles and improper torsions, namely partial Hessian blocks. It has been reported that the partial Hessian blocks can be used to derive accurate force constants for single molecules.48–50 Our design of the differentiable partial Hessian loss enables fitting force constants in batches, combining training accuracy with computational efficiency.
Given the significance of torsion profiles in the quality of force fields, in the training stage, we incorporated the curated torsion dataset to fit the force constants of proper torsions, while the other parameters were trained using optimization dataset following the same manner in the pre-training stage. Here, we employed the Boltzmann MSE loss functions51 into an iterative optimization-and-training process to train the force constants of proper torsions using the torsion dataset. In each iteration, the QM-optimized geometries were refined by the force field, with the torsion angle constrained and atom positions restrained.18 The parameters were then trained on both the optimization and torsion datasets. As the force field's accuracy improved, the positional restraint force constant was gradually reduced. Additionally, L1-norm regularization was applied to the force constants of proper torsions to restrain redundant degrees of freedom. The combination of pre-training and training stages yields the ByteFF-gopt model.
In the final fine-tuning stage, we incorporated part of the training set in Espaloma-0.3.0,32 named off-equilibrium dataset in this work, to refine the force field parameters with the QM energy and forces. Combining all the three stages (pre-training, training, and fine-tuning), the ByteFF-joint model was obtained.
The detailed settings of the three-stage training procedure are summarized in Table 1.
Pre-training | Training | Fine-tuning | |
---|---|---|---|
Model parameters initialized | Randomly initialized parameters | Parameters from pre-training stage | Parameters from training stage |
Datasets and fitting targets | Optimization dataset: | Optimization dataset: | Optimization dataset: |
- GAFF-2.2 σ/ε/q/kϕ | - GAFF-2.2 σ/ε/q | same as training | |
- EMMbond/EMMangle/EMMimproper | - EMMbond/EMMangle/EMMimproper | Torsion dataset: | |
- QM hessian matrix | - QM hessian matrix | same as training | |
Torsion dataset: | Off-equilibrium dataset: | ||
- QM energy | - QM energy/force | ||
- L1-norm of kϕ | |||
Optimization-and training strategy | No force field optimization | Three optimization-and-trainiterations on torsion dataset | Using coordinates from the last iteration of training stage |
Optimizer | RAdam | RAdam | RAdam |
Learning rate | 10−4 | 10−4 | 2 × 10−5 |
Scheduler | ReduceLROnPlateau | ReduceLROnPlateau | ReduceLROnPlateau |
Factor: 0.2 | Factor: 0.2 | Factor: 0.2 | |
Patience: 10 | Patience: 4 | Patience: 4 | |
Threshold: 10−6 | Threshold: 10−6 | Threshold: 10−6 | |
Early stop patience | 50 | 10 | 10 |
Yield | — | ByteFF-gopt | ByteFF-joint |
Three different types of benchmarks were used to provide a thorough assessment of ByteFF. First, the large-scale industrial collaborative OpenFFBenchmark public dataset54 was used to quantify how well the force field-relaxed geometries and energies reproduced the QM-relaxed references, without constraints. Three different metrics were used, namely root-mean-square deviation of atomic positions (RMSD), torsion fingerprint deviation (TFD),55 and relative energy difference (ΔΔE) defined by OpenFF team.56 Second, the crucial torsional energy profile performance was benchmarked on two different datasets, namely the publicly available TorsionNet500 dataset57 and the BDTorsion dataset curated in-house. For consistency, energies of conformations in TorsionNet500 (ref. 57) dataset were re-calculated with the B3LYP-D3(BJ)/DZVP level of theory. On these datasets, the root mean squared error (RMSE) and Boltzmann-weighted RMSE of each torsional energy profile were calculated and compared.
Beyond the constrained (torsional energy) and unconstrained (OpenFFBenchmark) optimization benchmarks, the performance of ByteFF was further benchmarked on the QM references energies and forces calculated on the off-equilibrium dataset, which includes subsets of the SPICE dataset58 and the RNA Structure Atlas.59 On this dataset, RMSE of energies and forces were calculated to quantify if the PES is properly captured beyond local minima.
Furthermore, due to the significance of validating the accuracy of ByteFF in MD simulations, metadynamics (MetaD) simulations60,61 were conducted on several drug-like fragments to compare the conformational free energy surface (FES) predicted by various MMFFs to the reference predicted by QM.62,63 In MetaD simulations, SPONGE (Simulation Package tOward Next GEneration molecular modelling)64,65 was used as the simulation engine, employing the GPU4PySCF and OpenMM66 as the backends for QM and MMFF calculations, respectively.
Rigorous procedures were followed in the benchmarking of ByteFF models, which ensured no data leakage between the training and testing data. More details of these benchmarks can be found in ESI.†
As illustrated in Fig. 3, both ByteFF-gopt and ByteFF-joint significantly outperform competitors on all three benchmark datasets. On all the three datasets, ByteFF-gopt exhibits the best performance on both metrics, with the highest population of predictions with lower RMSE and Boltzmann RMSE. When tested on TorsionNet500, the Boltzmann RMSE of most predictions by ByteFF-gopt are within 0.5 kcal mol−1, while none of the Boltzmann RMSE exceeds 1.33 kcal mol−1, demonstrating the exceptional accuracy of ByteFF-gopt in terms of predicting torsional energy profiles. On the more challenging BDTorsion datasets, both RMSE and Boltzmann RMSE are generally higher for all the force fields being tested. Nevertheless, ByteFF-gopt and ByteFF-joint still outperformed competitors significantly, with most Boltzmann RMSE values below 1.0 kcal mol−1 (chemical accuracy). In these torsion profile benchmarks, the performance of ByteFF-joint is compromised by the addition of off-equilibrium energies and forces training data. This is inevitable due to the very limited fixed functional form of MMFFs.
One key feature of organic molecules is the presence of a great variety of rings,68 and in-ring torsion parameters have crucial impact on the conformational PES of rings. However, this is much less discussed in the development of MMFFs. In this work, we carefully constructed the in-ring torsion scan training dataset to improve the performance of ByteFF on in-ring torsions. As shown in Fig. 3(e) and (f), both ByteFF-gopt and ByteFF-joint significantly outperform competitors, with both RMSE and Boltzmann RMSE mostly better than chemical accuracy, respectively. As an example to illustrate the accuracy of ByteFF-gopt and ByteFF-joint in predicting in-ring torsional energy profile, we present a molecule from the BDTorsion-InRing dataset that contains a four-membered aliphatic ring (Fig. 4(a)). On this molecule, we calculated the torsional energy profile for the in-ring torsion atoms highlighted in Fig. 4(a), and compared PES predictions from various force fields with the QM references (Fig. 4(b)). It is evident that both Espaloma-0.3.0 (green) and OpenFF-2.0.0 (blue) failed to predict the energy landscape in the vicinity of the local minimum, which could lead to incorrect predictions of ring conformations in geometric optimization or MD simulations. Though GAFF-2.2 captured the approximate shape of this torsional energy landscape, it overestimated the barrier near −130° while significantly underestimated the energy near −160°, which led to significantly incorrect shifts of the predicted location of global energy minimum. In contrast, both ByteFF-gopt and ByteFF-joint predictions align well with QM references, which not only captured the shape of the energy landscape, but also accurately predicted the positions and energy differences of local minima. In addition to accurately predicting in-ring torsional energy profiles, ByteFF models also excel in the prediction of non-ring torsions, as illustrated with the example molecule in Fig. 4(c). As shown in Fig. 4(d), ByteFF-gopt and ByteFF-joint successfully predicted the torsional energy landscape of the highlighted non-ring torsion, which is highly consistent with the QM reference, while other force fields failed to predict both the positions of energy minima and the height of the energy barrier. More example molecules are provided in Fig. S2 and S3† including both in-ring and non-ring torsions, where ByteFF models predicted the torsional energy profiles with exceptional accuracy against the QM reference, while other force fields failed to achieve.
The effectiveness and accuracy of ByteFF-gopt and ByteFF-joint stems from the synergy of two components: comprehensive in-house training datasets and carefully tweaked training strategy. The iterative optimization-and-training strategy allows the model to accurately capture the energy landscape along torsional degrees of freedom, minimizing interference from complex interactions such as bond-torsion coupling that is beyond the limited description of the chosen functional form. The predictions are further enhanced by the simple but effective model-ensemble averages.
As shown in Fig. 5(a) and (b), the distribution of ByteFF-gopt and ByteFF-joint in terms of RMSD and TFD are both concentrated near zero, with higher peak values near zero and narrower distributions than other force fields, demonstrating superior consistencies with QM-optimized molecular conformations. Particularly, both ByteFF-gopt and ByteFF-joint significantly outperform the state-of-the-art “OPLS4 cst” model, which is OPLS4 (ref. 12) with parameters refined for each molecule individually using the FFBuilder tool. Additionally, we calculated the relative energy differences (ΔΔE) to assess energetic agreement between force field-optimized conformations and the QM-optimized ones (Fig. 5(c)). Both ByteFF-gopt and ByteFF-joint exhibit sharp peaks around zero in their ΔΔE distribution, significantly outperform all other force fields, indicating that both of them reproduce correctly the QM relative energies between conformers. The accurate predictions of equilibrium conformations ensures the correct conformations being sampled in MD simulations, while the precise prediction of ΔΔE guarantees the Boltzmann distribution in the conformational space is properly reproduced.
![]() | ||
Fig. 5 Histograms of different metrics on OpenFFBenchmark dataset. The accuracy of force field-relaxed geometry relative to QM-relaxed references is quantified with (a) RMSD and (b) Torsion Fingerprint Deviation (TFD) scores. The energetic accuracy is quantified by (c) ΔΔE distributions. All benchmark results for “OPLS4 cst”, and the detailed protocols to calculate the benchmark results are obtained from ref. 54. |
Fig. 6 illustrates a more detailed example of the performance of reproducing equilibrium geometries for various force fields. The QM-relaxed reference geometry shows that the four-membered ring takes a “butterfly” conformation due to significant torsional strain. The predictions relaxed using various force fields, including ByteFF-gopt (green), Espaloma-0.3.0 (yellow), and GAFF-2.2 (cyan), are superimposed on the QM reference. It is evident that Espaloma-0.3.0 (yellow) predicts the four-member ring as a planer conformation, while GAFF-2.2 (cyan) overestimates the bending of the ring. Being at the center of the molecule, a minor misprediction of the four-member ring conformation results in more significant geometric deviations on the two ends of the molecule, leading to overall RMSD values of 0.9 Å for Espaloma-0.3.0 and 0.4 Å for GAFF-2.2, respectively. Notably, the geometry relaxed by ByteFF-gopt (green) achieved superior alignment with the QM prediction with an RMSD of 0.2 Å. With increasing ring sizes and increasing number of neighboring rings, the complexity of the conformational space expands, making precise predictions of in-ring torsion profiles increasingly critical.
Dataset | GAFF-2.2 | OpenFF-2.0 | Espaloma-0.3 | ByteFF-gopt | ByteFF-joint | |
---|---|---|---|---|---|---|
SPICE-Pubchem | Energy | 4.8 | 4.4 | 2.3 | 3.4 | 2.3 |
Force | 14.1 | 14.0 | 6.5 | 9.9 | 6.7 | |
SPICE-DES-monomers | Energy | 2.7 | 2.7 | 1.4 | 1.9 | 1.3 |
Force | 10.7 | 12.5 | 6.0 | 8.0 | 5.4 | |
SPICE-dipeptide | Energy | 5.2 | 4.5 | 3.3 | 4.0 | 2.3 |
Force | 12.5 | 12.4 | 8.1 | 9.9 | 5.3 | |
RNA-diverse | Energy | 6.7 | 5.4 | 3.9 | 4.6 | 3.5 |
Force | 15.8 | 18.4 | 4.5 | 9.7 | 3.7 | |
RNA-trinucleotide | Energy | 6.2 | 6.0 | 3.8 | 4.4 | 3.4 |
Force | 16.5 | 18.7 | 4.4 | 10.0 | 3.6 |
As a result, ByteFF-joint achieves state-of-the-art performance in almost all subsets, except for the SPICE-Pubchem dataset, where ByteFF-joint shows marginally higher force RMSE compared to Espaloma-0.3.0. As discussed earlier, it is inherently challenging of an MMFF to perfectly replicate the QM PES. There is a trade-off between predicting the PES of relaxed-geometry and off-equilibrium conformations, and we focused more on the relaxed-geometry performances during the training process, since this is more relevant to the sampling of local minima in molecular dynamics simulations and is more stressed in the latest large-scale collaborative industrial benchmark.54 Despite this, ByteFF-joint achieves overall state-of-the-art performance in all benchmarks considered in this study.
In addition to performance benchmarking on the static small molecule conformations, ByteFF was further validated through MD simulations. From the ligands in an FEP benchmark dataset,69 10 drug-like fragments were chosen for these simulations, as illustrated in Fig. 7 and S4.† In each fragment, a critical torsion angle was selected as the collective variable (CV) for the MetaD simulations.60,61 During the MetaD simulations, Gaussian bias potentials were applied along the selected CV, and the corresponding torsional FES was obtained. Unlike torsion profiles, which are obtained from locally minimized geometries, the torsional FES is averaged from all geometries following the Boltzmann distribution, making it directly related to experimental results. To illustrate the performance of ByteFF in MetaD simulations, a common fragment in the PTP1B subset from the FEP+ R-group dataset is presented in Fig. 7(a), with the highlighted torsion serving as the CV. As shown in Fig. 7(b), ByteFF-gopt and ByteFF-joint excel in predicting the torsional FES for the highlighted torsion, achieving high consistency with the QM references in both the peak/valley positions and the relative heights of the peaks. Additional results for the other 9 fragments are provided in Fig. S4,† where ByteFF-gopt and ByteFF-joint outperform other force fields in most cases. These MetaD simulation results not only confirm the reliability of ByteFF in the MD simulations but also support the rationale behind fitting torsion profiles in the development of ByteFF.
Footnotes |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d4sc06640e |
‡ Work done as intern at ByteDance Research. |
This journal is © The Royal Society of Chemistry 2025 |