End-to-end differentiable construction of molecular mechanics force fields

Molecular mechanics (MM) potentials have long been a workhorse of computational chemistry. Leveraging accuracy and speed, these functional forms find use in a wide variety of applications in biomolecular modeling and drug discovery, from rapid virtual screening to detailed free energy calculations. Traditionally, MM potentials have relied on human-curated, inflexible, and poorly extensible discrete chemical perception rules (atom types) for applying parameters to small molecules or biopolymers, making it difficult to optimize both types and parameters to fit quantum chemical or physical property data. Here, we propose an alternative approach that uses graph neural networks to perceive chemical environments, producing continuous atom embeddings from which valence and nonbonded parameters can be predicted using invariance-preserving layers. Since all stages are built from smooth neural functions, the entire process—spanning chemical perception to parameter assignment—is modular and end-to-end differentiable with respect to model parameters, allowing new force fields to be easily constructed, extended, and applied to arbitrary molecules. We show that this approach is not only sufficiently expressive to reproduce legacy atom types, but that it can learn to accurately reproduce and extend existing molecular mechanics force fields. Trained with arbitrary loss functions, it can construct entirely new force fields self-consistently applicable to both biopolymers and small molecules directly from quantum chemical calculations, with superior fidelity than traditional atom or parameter typing schemes. When adapted to simultaneously fit partial charge models, espaloma delivers high-quality partial atomic charges orders of magnitude faster than current best-practices with low inaccuracy. When trained on the same quantum chemical small molecule dataset used to parameterize the Open Force Field (“Parsley”) openff-1.2.0 small molecule force field augmented with a peptide dataset, the resulting espaloma model shows superior accuracy vis-á-vis experiments in computing relative alchemical free energy calculations for a popular benchmark. This approach is implemented in the free and open source package espaloma, available at https://github.com/choderalab/espaloma.

entirely new force fields self-consistently applicable to both biopolymers and small molecules directly from quantum chemical calculations, with superior fidelity than traditional atom or parameter typing schemes. When adapted to simultaneously fit partial charge models, espaloma delivers high-quality partial atomic charges orders of magnitude faster than current best-practices with low inaccuracy. When trained on the same quantum chemical small molecule dataset used to parameterize the Open Force Field ("Parsley") openff-1.2.0 small molecule force field augmented with a peptide dataset, the resulting espaloma model shows superior accuracy vis-á-vis experiments in computing relative alchemical free energy calculations for a popular benchmark. This approach is implemented in the free and open source package espaloma, available at https://github.com/choderalab/espaloma. Molecular mechanics (MM) force elds-physical models that abstract molecular systems as atomic point masses that interact via nonbonded interactions and valence (bond, angle, torsion) terms-have powered in silico modeling to provide key insights and quantitative predictions in all aspects of chemistry, from drug discovery to materials science. 1-9 While recent work in quantum machine learning (QML) potentials has demonstrated how exibility in functional forms and training strategies can lead to increased accuracy, [10][11][12][13][14][15][16] these QML potentials are orders of magnitude slower than popular molecular mechanics potentials even on expensive hardware accelerators, as they involve orders of magnitude more oating point operations per energy or force evaluation.
On the other hand, the simpler physical energy functions of MM models are compatible with highly optimized implementations that can exploit a wide variety of hardware, 2,17-21 but rely on complex and inextensible legacy atom typing schemes for parameter assignment: 22 First, a set of rules is used to classify atoms into discrete atom types that must encode all information about an atom's chemical environment needed in subsequent parameter assignment steps.
Next, a discrete set of bond, angle, and torsion types is determined by composing the atom types involved in the interaction.
Finally, the parameters attached to atoms, bonds, angles, and torsions are assigned according to a look-up table of composed parameter types.
Consequently, atoms, bonds, angles, or torsions with distinct chemical environments that happen to fall into the same expert-derived discrete type class are forced to share the same set of MM parameters, potentially leading to low resolution and poor accuracy. Furthermore, the explosion of the number of discrete parameter classes describing equivalent chemical environments required by traditional MM atom typing schemes not only poses signicant challenges to extending the space of atom types, 22 but optimizing these independently has the potential to compromise generalizability and lead to over-tting. Even with modern MM parameter optimization frameworks [23][24][25] and sufficient data, parameter optimization is only feasible in the continuous parameter space dened by these xed atom types, while the mixed discrete-continuous optimization problem-jointly optimizing types and parameters-is intractable.
Here, we present a continuous alternative to traditional discrete atom typing schemes that permits full end-to-end differentiable optimization of both typing and parameter assignment stages, allowing an entire force eld to be built, extended, and applied using standard automatic differentiation machine learning frameworks such as PyTorch, 32 TensorFlow, 33 or JAX 34 (Fig. 1). Graph neural networks have recently emerged as a powerful way to learn chemical properties of atoms, bonds, and molecules for biomolecular species (both small organic molecules and biopolymers), which can be considered isomorphic with their graph representations. [35][36][37][38][39][40][41][42][43][44] We hypothesize that graph neural networks operating on molecules have expressiveness that is at least equivalent to-and likely much greater than-expert-derived typing rules, with the advantage of being able to smoothly interpolate between representations of chemical environments (such as accounting for fractional bond orders 45 ). We provide empirical evidence for this in Section 1.1.
Next, we demonstrate the utility of such a model (which we call the extensible surrogate potential optimized by messagepassing, or espaloma) to construct end-to-end optimizable force elds with continuous atom types. Espaloma assigns molecular mechanics parameters from a molecular graph in three differentiable stages ( Fig. 1): Stage 1: continuous atom embeddings are constructed using graph neural networks to perceive chemical environments (Section 1.1).
Stage 2: continuous bond, angle, and torsion embeddings are constructed using pooling that preserves appropriate symmetries (Section 1.2). Stage 3: molecular mechanics force eld parameters are computed from atom, bond, angle, and torsion embeddings using feed-forward networks (Section 1.3).
Additional molecular mechanics parameter classes (such as point polarizabilities, valence coupling terms, or even parameters for charge-transfer models 46 ) can easily be added in a modular manner.
Similar to legacy molecular mechanics parameter assignment infrastructures, molecular mechanics parameters are assigned once for each system, and can be subsequently used to compute energies and forces or carry out molecular simulations with standard molecular mechanics packages. Unlike traditional legacy force elds, espaloma model parameters F NNwhich dene the entire process by which molecular mechanics force eld parameters F FF are generated ad hoc for a given molecule-can easily be t to data from scratch using standard, highly portable, high-performance machine learning frameworks that support automatic differentiation.
Here, we demonstrate that espaloma provides a sufficiently exible representation to both learn to apply existing MM force elds and to generalize them to new molecules (Section 2). Espaloma's modular loss function enables force elds to be learned directly from quantum chemical energies (Section 3), partial charges (Section 4), or both. The resulting force elds can generate self-consistent parameters for small molecules, biopolymers (Section 5), and covalent adducts (Section 1). Finally, an espaloma model t to the same quantum chemical dataset used to build the Open Force Field OpenFF ("Parsley") openff-1.2.0 small molecule force eld, augmented with peptide quantum chemical data, can outperform it in an out-of-sample kinase : inhibitor alchemical free energy benchmark (Section A.4 in ESI †).
1 Espaloma: end-to-end differentiable MM parameter assignment First, we describe how our proposed framework, espaloma ( Fig. 1), operates analogously to legacy force eld typing schemes to generate MM parameters F FF from a molecular graph G and neural model parameters F NN , (1) which can subsequently be used to compute the MM energy (as in eqn (14) in ESI †) for any conformation. A brief graphtheoretic overview of molecular mechanics force elds is provided in the Appendix (Section C in ESI †).
1.1 Stage 1: graph neural networks generate a continuous atom embedding, replacing legacy discrete atom typing We propose to use graph neural networks 35-44 as a replacement for rule-based chemical environment perception. 22 These neural architectures learn useful representations of atomic chemical environments from simple input features by updating and pooling embedding vectors via message passing iterations to neighboring atoms. 44 As such, symmetries in chemical graphs (chemical equivalencies) are inherently preserved, while a rich, continuous, and differentiably learnable representation of the atomic environment is derived. For a brief introduction to graph neural networks, see Appendix Section D in ESI †. Traditional molecular mechanics force eld parameter assignment schemes such as Antechamber/GAFF 26,27 or CGenFF 47,48 use attributes of atoms and their neighbors (such as atomic number, hybridization, and aromaticity) to assign discrete atom types. Subsequently, atom, bond, angle, and torsion parameters are assigned for specic combinations of these discrete types according to human chemical intuition. 22 On a closer look, this scheme resembles a two-or three-step Weisfeiler-Leman test, 49 which has been shown to be well approximated by some graph neural network architectures. 35 We hypothesize that graph neural network architectures can be at least as expressive as legacy atom typing rules.
To compute continuous atom embeddings, we start with a molecular graph G whose atoms (nodes) are labeled with simple chemical properties (here, we consider element, hybridization, aromaticity, formal charge, and membership in various ring sizes) easily computed in any cheminformatics toolkit. Sequential application of the graph neural network message-passing update rules (Appendix Section D in ESI †) then computes an updated set of atom (node) features in each graph neural network layer, and the nal atom embeddings h v˛ℝ jGjÂD are extracted from the nal layer. The loss on training data is then minimized by minimizing the cross-entropy loss between predicted and reference types. { We use a subset of ZINC 50 provided with parm@Frosst to validate atom typing implementations 51 (7529 small drug-like molecules, partitioned 80 : 10 : 10 into training : validation : test sets) for this experiment. Reference GAFF 1.81 27 atom types are assigned using Antechamber 27 from AmberTools and are used for training and testing.
1.1.1 Graph neural networks can reproduce legacy atom types with high accuracy. The test set performance is reported in Fig. 2, where the overall accuracy between reference legacy types and learned types is very high-98.31% 98.63% 97.94% , where suband superscripts represent a 95% condence interval. In analyzing the infrequent failures, we nd the model assigns atom types that correspond to the reference type more oen when the atom type appears more frequently in the training data, whereas the discrepancies occur in assigning rare types and types whose denitions follow a more sophisticated (but Fig. 1 Espaloma is an end-to-end differentiable molecular mechanics parameter assignment scheme for arbitrary organic molecules. Espaloma (extendable surrogate potential optimized by message-passing) is a modular approach for directly computing molecular mechanics force field parameters F FF from a chemical graph G such as a small molecule or biopolymer via a process that is fully differentiable in the model parameters F NN . In Stage 1, a graph neural network is used to generate continuous latent atom embeddings describing local chemical environments from the chemical graph (Section 1.1). In Stage 2, these atom embeddings are transformed into feature vectors that preserve appropriate symmetries for atom, bond, angle, and proper/improper torsion inference via Janossy pooling (Section 1.2). In Stage 3, molecular mechanics parameters are directly predicted from these feature vectors using feed-forward neural nets (Section 1.3). This parameter assignment process is performed once per molecular species, allowing the potential energy to be rapidly computed using standard molecular mechanics or molecular dynamics frameworks thereafter. The collection of parameters F NN describing the espaloma model can be considered as the equivalent complete specification of a traditional molecular mechanics force field such as GAFF 26,27 /AM1-BCC 28,29 in that it encodes the equivalent of traditional typing rules, parameter assignment tables, and even partial charge models. This final stage is modular, and can be easily extended to incorporate additional molecular mechanics parameter classes, such as parameters for a charge-equilibration model (Section 4), point polarizabilities, or valence-coupling terms for class II molecular mechanics force fields. 30,31 chemically arbitrary) logic. For instance, one of the most frequent confusions is the misassignment of ca (sp 2 carbon in pure aromatic systems) to cp (head sp 2 carbon that connects two rings in biphenyl systems, occurring in only 0.6% of the dataset). The relative ambiguity of the various types that are most frequently confused is suggestive that the graph net makes human-like errors in perceiving subtle differences between distinct chemical environments.
The benets of neural embedding compared to legacy discrete typing are many-fold: Legacy typing schemes are generally described in text form in published work (for example ref. 26 and 27), creating the potential for discrepancies between implementations when different cheminformatics toolkits are used. By contrast, with the knowledge to distinguish chemical environments stored in latent vectors and not dependent on any manual coding, our approach is deterministic once trained, and is portable across platforms thanks to modern machine learning frameworks.
Both the chemical perception process and the application of force eld parameters F FF can be optimized simultaneously via gradient-based optimization of F NN using standard machine learning frameworks that support automatic differentiation.
While extending a legacy force eld by adding new atom types can lead to an explosion in the number of parameter types, continuous neural embeddings do not suffer from this limitation; expansion of the typing process occurs automatically as more diverse training examples are introduced.
1.2 Stage 2: symmetry-preserving pooling generates continuous bond, angle, and torsion embeddings, replacing discrete types Terms in a molecular mechanics potential are symmetric with respect to certain permutations of the atoms involved in the interaction. For example, harmonic bond terms are symmetric with respect to the exchange of atoms involved in the bond. More elaborate symmetries are frequently present, such as in the threefold terms representing improper torsions for the Open Force Field "Parsley" generation of force elds (k-i-j-l, k-j-l-i, and k-l-i-j, where k is the central atom). 53 Traditional force elds, for bond, angle, and proper torsion terms, enforce this by ensuring equivalent orderings of atom types receive the same parameter value.k For neural embeddings, the invariances of valence terms with respect to these atom ordering symmetries must be considered while searching for expressive latent representations to feed into a subsequent parameter prediction network stage. Inspired by Janossy pooling, 55 we enumerate the relevant equivalent atom permutations to derive bond, angle, and torsion embeddings h r , h q , h f that respect these symmetries from atom embeddings h v (see Fig. 1 Stage 2), (3) where columns ( : ) denote concatenation**. As such, the order invariance is evident, i.e.,

Stage 3: neural parametrization of atoms, bonds, angles, and torsions replaces tabulated parameter assignment
In the nal stage, each feed-forward neural network modularly learns the mapping from these symmetry-preserving atom, bond, angle, and torsion encodings to MM parameters F FF that reect the specic chemical environments appropriate for these terms: where k r and k q denote force constants for bonds and angles, r 0 and q 0 denote equilibrium bond lengths and angles, k f,n denotes a torsion energy factor (which can be positive or negative) for periodicity n, s v is the effective radius and 3 v is the effective well depth for Lennard-Jones interactions. (For a brief review of the molecular mechanics force elds functional term, see Appendix Section C in ESI. †) This stage is analogous to the nal table lookup step in traditional force eld construction, but with signicant added exibility arising from the continuous embedding that captures the chemical environment specic to the potential energy term being assigned.
Here, we use Lennard-Jones parameters from legacy force elds (here, the Open Force Field 1.2.0 "Parsley" small molecule force eld 56 ) to avoid having to include condensed-phase physical properties in the tting procedure. While including condensed-phase physical properties in the loss function is possible, it is very expensive to do so, and as our experiments demonstrate, may not be necessary for achieving increased accuracy over legacy force elds.
We also found producing bond and angle parameters directly in Stage 3 to frustrate optimization, so we employ a mixture of linear bases to represent harmonic energies that can be translated back to the original functional form (see Appendix Section D.1.1 in ESI †). Similarly, we do not t phases and periodicities of torsions as they are discrete. We instead x phases at f 0 ¼ 0 and t all periodicities n ¼ 1, ., 6. This allows the corresponding torsion barriers K n to assume the entire continuum of positive or negative values; as a result, K n < 0 mimics the effect of f 0 ¼ p.
As a result of using the continuous atom embedding vectors to represent chemical environments for each atom, it is possible to intelligently interpolate between relevant chemical environments seen during training. This interpolation produces more nuanced varieties of parameters than either traditional atom typing or direct chemical perception, and is capable of capturing subtle effects arising from fractional bond order perturbation. 45 Due to the modularity of this stage, it is easy to add new modules or swap out existing ones to explore other force eld functional forms, such as alternative vdW interactions; 57 pair-specic Lennard-Jones interaction parameters; 58,59 point polarizabilities for instantaneous dipole, 60 Drude oscillator, 61 or Gaussian charge 62 polarizability models; class II valence couplings; 63-65 charge transfer; 46 or other potential energy terms of interest.
2 Espaloma can learn to mimic existing molecular mechanics force fields from snapshots and associated potential energies Having established that graph neural networks have the capacity to learn to reproduce legacy atom types describing distinct chemical environments, we ask whether espaloma is capable of learning to reproduce traditional molecular mechanics (MM) force elds assigned via standard atom typing schemes. In addition to quantifying how well a force eld can be learned when the exact parameters of the model being learned are known, being able to accurately learn existing MM force elds would have numerous applications, including replacing legacy non-portable parameter assignment codes with modern portable machine learning frameworks, learning to generalize to new molecules that contain familiar chemical environments, and permitting simplied parameter assignment for complex, heterogeneous systems involving post-translational modications, covalent ligands, or heterogeneous combinations of biopolymers and small molecules.
To assess how well espaloma can learn to reproduce a molecular mechanics force eld from a limited amount of data, we selected a dataset with limited chemical complexity-PhAlkEthOH 66,67 -which consists of 7408 linear and cyclic molecules containing phenyl rings, small alkanes, ethers, and alcohols composed of only the elements carbon, oxygen, and hydrogen. Three-and four-membered rings are excluded in the dataset since they would cause instability in the prediction of energies (see Section H in ESI †). We generated a set of 100 conformational snapshots for each molecule using short molecular dynamics simulations at 300 K initiated from multiple conformations to ensure adequate sampling of conformers. The PhAlkEthOH dataset was randomly partitioned (by molecules) into 80% training, 10% validation, and 10% test molecules, and an espaloma model was trained with early stopping via monitoring for a decrease in accuracy in the validation set. The performance of the resulting model is shown in Fig. 3.

Espaloma can learn existing force elds and generalize to new molecules with low error
Espaloma is able to achieve very low total energy and parameter error on the training set, suggesting that espaloma can learn the parameters of typed molecules from energies alone. In addition, error on the out-of-sample test set of molecules is comparableless than 0.02 kcal mol −1 -suggesting that espaloma can effectively generalize to new molecules within the same chemical space. Surprisingly, the total energy RMSE is lower than the angle energy RMSE, suggesting that there is some degeneracy in how energy contributions are distributed among valence energy terms. It is worth noting that espaloma requires very few conformations per molecule to achieve high accuracy; we closely study its data efficiency in Appendix Section 8 in ESI †.
3 Espaloma can fit quantum chemical energies directly to build new molecular mechanics force fields Since espaloma can derive a force eld solely by tting to energies (and optionally gradients), we repeat the end-to-end tting experiment (Section 2) directly using quantum chemical (QM) datasets used to build and evaluate MM force elds.
We assessed the ability of Espaloma to learn several distinct quantum chemical datasets generated by the Open Force Field Initiative 70 and deposited in the MolSSI QCArchive 71 with B3LYP-D3BJ/DZVP level of theory: PhAlkEthOH 66 is a collection of compounds containing only the elements carbon, hydrogen, and oxygen in compounds containing phenyl rings, alkanes, ketones, and alcohols. Limited in elemental and chemical diversity, this dataset is chosen as a proof-of-concept to demonstrate the capability of espaloma to t and generalize quantum chemical energies when training data is sufficient to exhaustively cover the breadth of chemical environments.
OpenFF Gen2 Optimization 72 consists of druglike molecules used in the parametrization of the Open Force Field 1.2.0 ("Parsley") small molecule force eld. 73 This set was constructed by the Open Force Field Consortium from challenging molecule structures provided by Pzer, Bayer, and Roche, along with diverse molecules selected from eMolecules to achieve useful coverage of chemical space.
VEHICLe 74 or virtual exploratory heterocyclic library, is a set of heteroaromatic ring systems of interest to drug discovery enumerated by Pitt et al. 75 The atoms in the molecules in this dataset have interesting chemical environments in heteroarmatic rings that present a challenge to traditional atom typing schemes, which cannot easily accommodate the nuanced distinctions in chemical environments that lead to perturbations in heterocycle structure. We use this dataset to illustrate that espaloma performs well in situations challenging to traditional force elds.
PepConf 76 from Prasad et al. 77 contains a variety of short peptides, including capped, cyclic, and disulde-bonded peptides.
This dataset-regenerated as an Opti-mizationDataset (quantum chemical optimization trajectories initiated from multiple conformers) using the Open Force Field QCSubmit tool 78 -explores the applicability of espaloma to biopolymers, such as proteins. Fig. 3 Espaloma accurately learns molecular mechanics parameters when fit to snapshot energies from a molecular mechanics force field. In this experiment, espaloma was used to fit molecular mechanics (GAFF 1.81) potential energies of snapshots generated from short molecular dynamics (MD) simulations initiated from multiple conformers of molecules from the PhAlkEthOH dataset, which uses only the elements carbon, oxygen, and hydrogen. 66 The dataset contains 7408 molecules with 100 snapshots each, and was partitioned by molecules 80 : 10 : 10 into train : validate : test sets. We excluded three-and four-membered rings in the dataset; for a detailed study on the roles of these two factors, see Section H in ESI †. Statistics quoted above the plots provide the root mean squared error (RMSE) between reference and predicted MM energies, and mean absolute percentage error (MAPE) (in fractional form) between reference and predicted force field parameters. The sub-and superscripts report the 95% confidence interval of each statistics estimated from 1000 bootstrapped replicates over molecules in the test set. Energy terms have kcal mol −1 units whereas the units of the force field parameters do not affect the statistics reported here. Force field parameters (F FF ): k r : bond force constant; r 0 : equilibrium bond length; k q : angle force constant; q 0 : equilibrium angle value. Torsion parameters are not shown because of the potential for degeneracy of fit given that all periodicities n ¼ 1, 2, ., 6 are learned by espaloma. See Section 2 for details.
Since nonbonded terms are generally optimized to t other condensed-phase properties, we focused here on optimizing only the valence parameters (bond, angle, and proper and improper torsion) to t these gas-phase quantum chemical datasets, xing the non-bonded energies using a legacy force eld. 70 In this experiment, all the non-bonded energies (Lennard-Jones and electrostatics) were computed using Open Force Field 1.2 Parsley, 79 with AM1-BCC charges generated by the OpenEye Toolkit back-end for the Open Force Field toolkit 0.10.0. 69 Because we are learning an MM force eld that is incapable of reproducing quantum chemical heats of formation, which are reected as an additive offset in the quantum chemical energy targets, snapshot energies for each molecule in both the training and test sets are shied to have zero mean. All datasets are randomly shuffled and split (by molecules) into training (80%), validation (10%), and test (10%) sets.

Espaloma generalizes to new molecules better than widely-used traditional force elds
To assess how well espaloma is able to generalize to new molecules, the performance for espaloma on test (and training) sets was compared to a legacy atom typing based force eld (GAFF 1.81 and 2.11, 26,27 which collectively have been cited over 13 066 times) and a modern force eld based on direct chemical perception 22 (the Open Force Field 1.2.0 ("Parsley") small molecule force eld, 56 downloaded over 150 000 times).
The results of this experiment are reported in Fig. 4. As can be readily seen by the reported test set root mean squared error (RMSE), espaloma can produce MM force elds with generalization performance consistently better than legacy force elds based on discrete atom typing (GAFF 26,27 ). In chemically wellrepresented datasets like PhAlkEthOH-which contains only simple molecules constructed from elements C, H, and Oespaloma is able to signicantly improve on the accuracy of traditional force elds such as OpenFF 1.2.0, GAFF-1.81, and GAFF-2.11 on the test set.
Surprisingly, even though OpenFF 1.2.0 included the "Open FF Gen 2 ′′ dataset in training, espaloma is able to achieve superior test set performance on this dataset, suggesting that both the exibility and generalizability of continuous atom typing have signicant advantages over even direct chemical perception. 22 Even compared to highly optimized late-generation protein force elds such as Amber ff14SB 80 -which was highly optimized to reproduce quantum chemical torsion drive dataespaloma achieves signicantly higher accuracy, improving on Amber ff14SB error of 3.1502 3.1859,* 3.1117 kcal mol −1 to achieve 1.8727 1.9749 1.7309 kcal mol −1 on the PepConf peptide dataset. 76,81 This suggests that espaloma is capable of effectively parameterizing both small molecule and biopolymer force elds. Indeed, when we train an espaloma model using both the OpenFF Gen2 Optimization and PepConf datasets (joint in Fig. 4(a)), we see Fig. 4 Espaloma can directly fit quantum chemical energies to produce new molecular mechanics force fields with better accuracy than traditional force fields based on atom typing or direct chemical perception. Espaloma was fit to quantum chemical potential energies for conformations generated by optimization trajectories initiated from distinct conformers in various datasets from QCArchive. 27 All datasets were partitioned by molecules 80 : 10 : 10 into train : validate : test sets. The number of molecules, optimization trajectories, and total snapshots are annotated in the table. (a) We report the RMSE on training and test sets, as well as the performance of legacy force fields on the test set. All statistics are computed with predicted and reference energies centered to have zero mean for each molecule, in order to focus on errors in relative conformational energetics rather than on errors in predicting the heats of formation of chemical species (which the MM functional form used here is incapable of). The 95% confidence intervals annotated are calculated by bootstrapping molecules with replacement using 1000 replicates. . Legacy force fields, because of their limited chemical typing rules, were not able to perceive the chemical environment of the nitrogen atoms, which were not aromatic. (c) Espaloma is able to predict energies for quantum chemical torsion scans for an out-of-sample torsion scan dataset (the OpenFF Phenyl Torsion Drive Dataset, 45,68 dihedral angle profiled marked in rouge) to high accuracy even though it was not trained on torsion scans (only optimization trajectories) or any of the molecules in the torsion scan set. We also include the torsion energy profile computed by a popular machine learning force field, ANI-1ccx. 12 *: Six cyclic peptides that cannot be parametrized using OpenForceField toolkit engine 69 are not included.
that a single espaloma model is capable of achieving superior accuracy to traditional small molecule and protein force elds simultaneously.

Espaloma can reliably learn torsion proles from optimization trajectories
We wondered whether espaloma could faithfully recover torsion energy proles-which are traditionally expensive to generate using methods like wavefront propagation 82 -from the inexpensive optimization trajectories used to train espaloma models. We therefore examined some representative dihedral energy proles for molecules outside of the dataset used to train espaloma. In Fig. 4(c), we use the espaloma model trained on OpenFF Gen2 Optimization and PepConf to predict the energy proles of several torsion drive experiments in the OpenFF Phenyl Torsion Drive Dataset 68 -which does not contain any of the molecules in the training set-and observed that the locations and heights of torsion energy barriers are recapitulated with reasonable accuracy. This suggests that optimization trajectories are sufficient to capture the locations and relative heights of torsion barriers-a highly useful nding given the relative expense of generating accurate torsion proles compared to simple optimization trajectories. 82

Espaloma can automatically learn distinct atom environments overlooked by traditional force elds
It is worth noting that the traditional, widely used force elds considered here uniformly perform poorly on the VEHICLe dataset 75 ("Heteroaromatic Rings of the Future", containing heterocyclic scaffolds of interest to future drug discovery programs). In Fig. 4(b), we show the most common mode of failure of legacy force elds by examining their predicted energy over the QM optimization trajectory of the compound with largest RMSE (with SMILES string [

H]C1]C(N2N(N] C(N(N2N]N1)[H])[H])[H])[H]
). The initial conformation of the molecule, generated by OpenEye Toolkit, was planar. As the conformation was optimized by quantum chemical methods, the tertiary nitrogens in the system become pyramidal. In a closer examination, GAFF-2.11, for instance, assigned all carbons to be of type cc and all nitrogens na, indicating that they were perceived as aromatic, whereas there is no conjugated system present in the molecule. This also reects the limitation in resolution of legacy force elds. Espaloma, on the other hand, provides a high-resolution atom embedding that can exibly characterize the chemical environments, provided that similar environments existed in the training data.

Espaloma can learn self-consistent charge models in an end-to-end differentiable manner
Historically, biopolymer force elds derive partial atomic charges via ts to high-level multiconformer quantum chemical electrostatic potentials on capped model compounds, adjusted to ensure the repeating biopolymer units have integral charge (oen incorporating constraints to share identical backbone partial charges). [83][84][85] Some approaches to the derivation of partial atomic charges are enormously expensive, requiring iterative QM/MM simulations in explicit solvent to derive partial charges for new molecules. 86,87 For small molecules, state-ofthe-art methods range from fast bond charge corrections applied to charges derived from semiempirical quantum chemical methods (such as AM1-BCC 28,29 or CGenFF charge increments 47 ) to expensive multiconformer restrained electrostatic potential (RESP) ts to high-level quantum chemistry. 88,89 Surprisingly little attention has been paid to the divergence of methods used for assigning partial charges to small molecules and biopolymers, and the potential impact this inconsistency has on accuracy or ease of use-indeed, developing charges for post-translational modications to biopolymer residues 90,91 or covalent ligands can prove to be a signicant technical challenge in attempting to bridge these two worlds.
While machine learning approaches have begun to nd application in determining small molecule partial charges, 92-94 methods such as random forests are not fully continuously differentiable, rendering them unsuitable for a fully end-to-end differentiable parameter assignment framework. Recently, a fast (500Â speed up for small molecules) approach has been proposed that uses graph neural networks as part of a chargeequilibration 95,96 scheme (inspired by the earlier VCharge model 97 ) to self-consistently assign partial charges to small molecules, biopolymers, and arbitrarily complex hybrid molecules in a conformation-independent manner that only makes use of molecular topology. 36 Perhaps unsurprisingly, due to the requirement that molecules retain their integral net charge, directly predicting partial atomic charges from latent atom embeddings and subsequently renormalizing charges leads to poor performance (0.28e (ref. 36)).
Instead, predicting the parameters of a simple physical topological charge-equilibration model 95,97 can produce geometry-independent partial charges capable of reproducing charges derived from quantum chemical electrostatic potential ts. 36 Note that, unlike Wang et al., 36 here we t AM1-BCC charges rather than higher level of quantum mechanics theory due to their high cost. Specically, we use our atom latent representation to instead predict the rstand secondorder derivatives of a pseudopotential energy E with respect to the partial atomic charge q i on atom i: here, the electronegativity e i quanties the desire for an atom to take up negative charge, while the hardness s i quanties the resistance to gaining or losing too much charge. A module is added to Stage III of espaloma to predict the chemical environment adapted (e i , s i ) parameters for each atom from the latent atom embeddings. The partial charges for all atoms can then be obtained by minimizing the second-order Taylor expansion of the potential pseudoenergy contributed by atomic charges: where Q is the total (net) charge of the molecule. Using Lagrange multipliers, the solution to 11 can be given analytically by:q whose Jacobian and Hessian are trivially easy to calculate. As a result, the prediction of {ê i ,ŝ i } could be optimized end-to-end using backpropagation. When predicting the partial charges independently, we observe that the RMSE error on the test set (0.0072 0.0073 0.0071 e), is smaller than the difference between the discrepancy between AM1-BCC charges assigned by two popular cheminformatics toolkits, Ambertools 21 80 and OpenEye Toolkit (0.0126 0.0129 0.0124 e). As shown in Appendix Table 1 in ESI †, there is only a slight decrease in energy performance (within condence interval) when we switch from AM1-BCC charges to this neural charge equilibration model. Additionally, we can integrate the charge equilibration model into the valence parameter prediction pipeline outlined in Section 1 to have the atom embeddings shared across two tasks and curate a single, combined model. With this approach, we observed a test set total RMSE of 1.2216 1.2766 1.1529 kcal mol −1 and charge RMSE of 0.0072 0.0073 0.0070 e, which is within the condence interval of the performance when predicting separately. We provide a detailed study of the joint learning in Appendix Section B in ESI †.

Espaloma can parameterize biopolymers
We have so far established that espaloma, as a method to construct MM force elds, shows great versatility and exibility. In the following sections, we showcase its utility with a model predicting both valence parameters and partial charges trained on OpenFF Gen2 Optimization Dataset as well as PepConf dataset, which we released as 'espaloma-0.2.2' with the package.
The speed and exibility of graph convolutional networks allows espaloma to parameterize even very large biopolymers, treating them as (large) small molecules in a graphtheoretical manner. While graph neural networks perceive nonlocal aspects of the chemical environment around each atom, the limited number of rounds of message passing ensures stability of the resulting parameters when parameterizing systems that consist of repeating residues, like proteins and nucleic acids.
To demonstrate this, we considered the simple polypeptide system ACE-ALA n -NME, consisting of n alanine residues terminally capped by acetyl-and N-methyl amide capping groups. Using the joint charge and valence term espaloma model, we assigned parameters to ACE-ALA n -NME systems with n ¼ 1, 2, ., 500, showing illustrative parameters in Fig. 5. Espaloma stably assigns parameters to the interior residues of peptides even as they increase in length, with parameters of the central residue unchanged aer n > 3. This pleasantly resembles the behavior of traditional residue template based protein force elds, even though no templates are used within espaloma's parameter assignment process.
5.1 Espaloma can generate self-consistent valence parameters and partial charges for large biopolymers in less than a second Despite its use of a sophisticated graph net machine learning model, the wall time required to parameterize large proteins scales linearly with respect to the number of residues (and hence system size) on a CPU (Fig. 5, lower right). On a GPU, the wall clock time needed to parameterize systems of this size stays roughly constant (due to overhead in executing models on the GPU) at less than 100 microseconds. Since espaloma applies standard molecular mechanics force elds, the energy evaluation times for an Espaloma-generated force eld are identical to traditional force elds.
6 Espaloma can produce selfconsistent biopolymer and small molecule force fields that result in stable simulations Traditionally, in a protein-ligand system, separate (but hopefully compatible) force elds and charge models have been assigned to small molecules (which are treated as independent entities parameterized holistically) and proteins (which are treated as collections of templated residues parameterized piecemeal). 80 This practice both has the potential to allow signicant inconsistencies while also introducing signicant complexity in parameterizing heterogeneous systems.
Using the joint espaloma model trained on both the "OpenFF Gen 2 Optimization" small molecule and "PepConf" peptide quantum chemical datasets (Section 3) 81,104 ), we can apply a consistent set of parameters to both protein and small molecule components of a kinase : inhibitor system. Fig. 5f shows the ligand heavy-atom RMSD aer aligning on protein heavy atoms for 0.5 ns trajectories of the Tyk2 : inhibitor system from the Alchemical Best Practices Benchmark Set 1.0. 105 It is readily apparent that the espaloma-derived parameters lead to trajectories that are comparably stable to simulations that utilize the Amber ff14SB protein force eld 106 with GAFF 1.81, GAFF 2.11,26,27 or OpenFF 1.2.0 (ref. 56) small molecule force elds. All systems are explicitly solvated with a 9Å buffer around the protein with TIP3P water 107 and use the Joung and Cheatham monovalent counterion parameters 108 to model a neutral system with 300 mM NaCl salt.
Additionally, the espaloma model also provides sufficient coverage to model more complex and heterogeneous protein-ligand covalent conjugates, which was highly non trivial in traditional force elds where protein and ligand are parametrized separately. We provide a detailed study of this capability in Appendix Section I in ESI †.

Espaloma small molecule parameters and charges provide accuracy improvements in alchemical free energy calculations
To assess whether the small molecule parameters and charges generated by espaloma achieve competitive performance to traditional force elds, we used the perses 0.9.5 relative alchemical free energy calculation infrastructure 103 (based on OpenMM 7.7 (ref. 17) and openmmtools 0. 21.2 (ref. 109) to compare performance on the Tyk2 kinase : inhibitor benchmark set from the Schrodinger JACS benchmark set 100 as curated by the OpenFF protein-ligand benchmark 0.2.0. 110 In order to assess the impact of espaloma small molecule parameters and charges in isolation, we used the Amber ff14SB protein force eld, 106 and performed simulations with either OpenFF 1.2.0 (openff-1.2.0) or the espaloma joint model trained on OpenFF Gen2 Optimization and PepConf datasets (espaloma-0.2.2) available through the openmmforceelds 0.11.0 package. 111 Notably, none of the ligands appearing in this set appear in the training set for either force eld. All systems were explicitly solvated with a 9Å buffer around the protein with TIP3P water 107 and use the Joung and Cheatham monovalent counterion parameters 108 to model a neutral system with 300 mM NaCl salt. The same transformation network provided in the OpenFF protein-ligand benchmark set was used to compute alchemical transformations, and absolute free energies up to an additive constant were estimated from a leastsquares estimation strategy 112 as implemented in the OpenFF arsenic package. 113 Both experimental and calculated absolute free energies were shied to their respective means before computing statistics, as in ref. 100. Fig. 6 shows a comparison of both relative (DDG) and absolute (DG) free energy error statistics. . While more extensive benchmarking is necessary to establish the generality of these improvements, this represents a rst demonstration that performance can be on par with, if not superior to, traditionally constructed force elds. Here, we have demonstrated that graph neural networks not only have the capacity to reproduce legacy atom type classication, but they are sufficiently expressive to t a traditional molecular mechanics force eld and generalize it to new molecules, as well as learn entirely new force elds directly from quantum chemical energies and experimental measurements. The neural framework presented here also affords the modularity to easily experiment with the inclusion of additional potential energy terms, functional forms, or parameter classes, while making it easy to rapidly ret the entire force eld aerwards.

Espaloma enables a wide variety of applications
Espaloma enables a wide variety of applications in the realm of molecular simulation: While many force eld packages use complex, difficult to maintain, non-portable custom typing engines, 27,[114][115][116] simply generating examples is sufficient to train espaloma to reproduce this typing, translating it into a model that is easy to extend by providing more quantum chemical training data. Some force elds have traditionally been typed by hand, making them difficult to automate; 116 espaloma can in principle learn to generalize from these examples, provided care is taken to avoid overtting during training. As we have shown here, espaloma also provides a convenient way to rapidly build new force elds directly from quantum chemical data.

Modern machine learning frameworks offer exibility in tting potentials
The exibility afforded by modern machine learning frameworks solves a long-standing problem in molecular simulation in which it is extremely difficult to assess whether a new functional form might lead to signicant benets in modeling multiple properties of interest. While efforts such as the Open Force Field Initiative aim to streamline the process of retting force elds, 104 the ease of retting models in machine learning frameworks makes it extremely easy to experiment with new functional forms: Modern automatic differentiation in these frameworks means that only the potential need be implemented, and gradients are automatically computed.
This enables a wide variety of exploration: Simple improvements could be widely implemented in current molecular simulation packages including adjusting the 1-4 Lennard-Jones and electrostatics scaling parameters, producing 1-4 interaction parameters that override Lennard-Jones combining rules, exploring different Lennard-Jones combining rules, 117 changing the van der Waals treatment to alternative functional forms (such as Buckingham exp-6 118 or Halgren potentials 57 ), and retting force elds for various non-bonded treatments (such as PME 119 and reaction eld electrostatics 120 ). Many simulation packages provide support for Class II molecular mechanics force elds, 30,31 which include additional coupling terms that can drastically reduce errors in modeling quantum chemical energies at essentially no meaningful impact on cost due to the O ðNÞ number of these terms; simple extensions to espaloma's architecture can easily predict the parameters for these coupling terms from additional symmetry-preserving features.
More radical potential explorations could involve assessing different algebraic functional forms-modern simulation packages such as OpenMM have the ability to automatically differentiate and compile symbolic algebraic expressions to produce optimized force kernels for simulation on fast GPUs. 17,98 Excitingly, the simplicity of incorporating a new generation of quantum machine learning (QML) potentials 121such as ANI 10,13 and SchNet 14 -means that it will be easy to explore hybrid potentials that combine the exibility of QML potentials at short range with the accuracy of physical forces at long range. 122

Espaloma can enable modular loss functions and regularization
The ease at which the loss function can be augmented with additional terms enables the addition of other classes of loss terms to the loss function. For example, one of the molecules considered in the Tyk2 : inhibitor system included a cyano group which proved to be slightly unstable with hydrogen mass repartitioning at 4 fs timesteps. The loss function could either be augmented to regularize parameters to increase stability (penalizing short vibrational periods) or to include other data classes (such as Hessians and/or torsion drive data) to improve ts to particular aspects. While this will require tuning of the weighting of different loss classes, these parameters can be selected automatically via cross-validation strategies.

Espaloma can enable Bayesian force eld parameterization and model uncertainty quantication
While much of the history of molecular simulation has focused on quantifying the impact of statistical uncertainty, [123][124][125] critical studies over the last decade [126][127][128][129][130][131][132][133][134][135] have improved our ability to quantify and propagate predictive uncertainty in molecular mechanics force elds by quantifying contributions from model uncertainty-which is frequently the major source of predictive uncertainty in applications of interest. While most attention has been focused on the continuous parameters of the force eld model with xed model form, some progress has been made in discrete model selection among candidate model forms. [136][137][138] It remains an open problem to rigorously quantify uncertainty in other important parts of the model denition-especially in the denitions of atom-types. These "chemical perception" denitions can involve very large spaces of discrete choices, and crucially inuence the behavior of a generalizable molecular mechanics model. 22,139 An important benet of the present approach is that it reduces the mixed continuous-discrete task of "being Bayesian about atom-types" to the more familiar task of "being Bayesian about neural network weights." Bayesian treatment of neural networks-while also intractable-has been the focus of productive study and methodological innovation for decades. 140 We anticipate that Bayesian extensions of this work will enable more comprehensive treatment of predictive uncertainty in molecular mechanics force elds.

Ensuring full chemical equivalence is nontrivial
In the current experiments, espaloma used a set of atom features (one-hot encoded element, hybridization, aromaticity, formal charge, and membership in rings of various sizes) easily computed using a cheminformatics toolkit; no bond features were used (see Detailed Methods in ESI †). While this provided excellent performance, the non-uniqueness of formal charge assignment (obvious in molecules such as guanidinium, where resonance forms locate the formal charge on different atoms) does not guarantee the assigned parameters will respect chemical equivalence (a form of invariance) in cases where these atom properties are not unique. Ensuring full chemical equivalence would require modications to this strategy, such as omission of non-unique features (which may require additional data or pre-training to learn equivalent chemical information), averaging of the output of one or more stages over equivalent resonance forms, or architectures such as transformers that more fully encode chemical equivalence.

Future directions: espaloma for free alchemical parameters
While we used Espaloma to generate parameters for the real physical endstates of an alchemical free energy calculation in this world, we note it is also possible to introduce dependence of these parameters on a global alchemical parameters to generate parameters for alchemical intermediate states as well. More complex loss functions could minimize the thermodynamic length along the alchemical coördinate to provide an efficient way to interpolate alchemical parameters. 141,142 Disclosures JDC is a current member of the Scientic Advisory Board of OpenEye Scientic Soware, Redesign Science, Ventus Therapeutics, and Interline Therapeutics, and has equity interests in Redesign Science and Interline Therapeutics. The Chodera laboratory receives or has received funding from multiple sources, including the National Institutes of Health, the National Science Foundation, the Parker Institute for Cancer Immunotherapy, Relay Therapeutics, Entasis Therapeutics, Silicon Therapeutics, EMD Serono (Merck KGaA), AstraZeneca, Vir Biotechnology, Bayer, XtalPi, Interline Therapeutics, the Molecular Sciences Soware Institute, the Starr Cancer Consortium, the Open Force Field Consortium, Cycle for