Kuzma
Khrabrov
*a,
Ilya
Shenbin
c,
Alexander
Ryabov
de,
Artem
Tsypin
a,
Alexander
Telepov
a,
Anton
Alekseev
cg,
Alexander
Grishin
a,
Pavel
Strashnov
a,
Petr
Zhilyaev
d,
Sergey
Nikolenko
cf and
Artur
Kadurin
*ab
aAIRI, Kutuzovskiy prospect house 32 building K.1, Moscow, 121170, Russia. E-mail: kadurin@airi.net; khrabrov@airi.net
bKuban State University, Stavropolskaya Street, 149, Krasnodar 350040, Russia
cSt. Petersburg Department of Steklov Mathematical Institute of Russian Academy of Sciences, nab. r. Fontanki 27, St. Petersburg 191011, Russia
dCenter for Materials Technologies, Skolkovo Institute of Science and Technology, Bolshoy Boulevard 30, bld. 1, Moscow, 121205, Russia
eMoscow Institute of Physics and Technology (National Research University), Institutsky lane, 9, Dolgoprudny, Moscow Region 141700, Russia
fISP RAS Research Center for Trusted Artificial Intelligence, Alexander Solzhenitsyn st. 25, Moscow, 109004, Russia
gSt. Petersburg University, 7-9 Universitetskaya Embankment, St Petersburg, 199034, Russia
First published on 24th October 2022
Electronic wave function calculation is a fundamental task of computational quantum chemistry. Knowledge of the wave function parameters allows one to compute physical and chemical properties of molecules and materials. Unfortunately, it is infeasible to compute the wave functions analytically even for simple molecules. Classical quantum chemistry approaches such as the Hartree–Fock method or density functional theory (DFT) allow to compute an approximation of the wave function but are very computationally expensive. One way to lower the computational complexity is to use machine learning models that can provide sufficiently good approximations at a much lower computational cost. In this work we: (1) introduce a new curated large-scale dataset of electron structures of drug-like molecules, (2) establish a novel benchmark for the estimation of molecular properties in the multi-molecule setting, and (3) evaluate a wide range of methods with this benchmark. We show that the accuracy of recently developed machine learning models deteriorates significantly when switching from the single-molecule to the multi-molecule setting. We also show that these models lack generalization over different chemistry classes. In addition, we provide experimental evidence that larger datasets lead to better ML models in the field of quantum chemistry.
Solving the many-particle SE is a complex task that has attracted a lot of attention from several generations of researchers, but, unfortunately, its analytical solution is still unknown. However, there exist a wide variety of numerical methods that solve it on different levels of precision. These methods comprise a hierarchy that trades off accuracy against computational cost and the number of electrons whose motion one can calculate in reasonable time using a particular technique.
At the top of the hierarchical pyramid are two families of Post-Hartree–Fock1 and quantum Monte Carlo methods.2 They are very accurate (approximately 1 kcal mol−1) but computationally expensive, allowing to consider systems of up to tens of electrons. All of them are based on manipulating the many-body wave function, which is represented as an expansion of one-electron orbitals with adjustable coefficients. Optimization search is performed in the space of these adjustable coefficients to a find a multi-particle wave function that provides the minimum energy of the system. Therefore, it most closely corresponds to the “real” multi-particle wave function of the ground state (the minimum energy state).
The second step of the hierarchical pyramid is taken by the density functional theory (DFT) method,3–5 which is currently the primary approach for solving the many-particle SE for electrons.
DFT is a mean-field method, where the many-particle problem is divided into several single-particle problems, and one solves SE for a single electron in the effective field of other electrons. The main difference between this method and more accurate ones is that it manipulates not the many-particle wave function but electron density, which is an observable quantity. DFT makes it possible to consider systems on a scale of 1000 electrons6 with satisfactory accuracy (approximately 10 kcal mol−1), thus scaling up to systems that are already nano-objects such as nanotubes and fullers, pieces of proteins, or parts of catalytic surfaces. Accuracy of the DFT is determined by the so-called exchange-correlation (XC) functional,7–9 which again has an accuracy/complexity tradeoff hierarchy within itself. It is believed that by looking for a fast and accurate exchange-correlation functional it may be possible to improve DFT's accuracy up to 1 kcal mol−1, thus making it almost equal in accuracy to the methods at the very top.
On the nominal third step of the hierarchy are the so-called parametric methods such as the tight-binding method,10 which require a parameterization of the Hamiltonian. They make calculations possible for extensive systems up to a tens of thousand electrons.11 However, the non-deterministic pre-parameterization step and large volatility in the resulting accuracy make this method less popular than DFT.
In addition to traditional numerical methods for solving the many-body SE for electrons, machine learning (ML) methods have emerged in abundance, looking for their own place in the hierarchy of accuracy/complexity. One promising direction to incorporate ML into the field is to develop a family of trial wave functions based on deep neural networks (NN); recent results show that it can outperform the best highly accurate quantum Monte–Carlo methods.12,13 Another direction is to directly predict the wave-function, electron densities, and/or the Hamiltonian matrix from atom coordinates (system configuration).14–18 The third direction is to use neural networks to model the XC functional for high accuracy DFT.19–26
The general framework of ab initio molecular property prediction consists of two steps: first compute the electron structure of a specific molecular conformation or a set of conformations, and then calculate desired properties based on the results of the first step. The second step is relatively simple, but total computational complexity could be too high depending on the method used on the first step.
One straightforward approach to avoid this complexity is to train a machine learning model to predict desired molecular properties directly, shortcutting around the electron structure part. However, this approach may lack generalization since one would need to develop and train a separate new model for each new property.
Recent studies have shown promising results in the field of electron structure prediction using a number of different ML methods. It avoids costly computations of DFT (or higher order) methods by substituting it with a relatively simple ML model but keeping the generalized property computation framework. In this way, the method only needs a single ML model for all necessary properties (Fig. 1).
Though there exist recent advances in Hamiltonian matrix approximation using ML (see Section 2), these studies suffer from two serious drawbacks. First, all models were trained and tested in the single-molecule setup (both training and testing on different conformations of the same molecule), and all models have problems with scaling up to larger molecular structures. In our study we focus on exploring these drawbacks.
An important inspiration for this work comes from the lack of datasets that could be used to train such models. The expressive power of machine learning models is meaningless unless supported by the size and variability of training data. Related fields are seeing the rise of large-scale datasets of small molecules and compounds where the necessary properties have been established by accurate and computationally expensive methods; for example, t6 he MOSES benchmarking platform27 has compared molecular generation models for drug discovery based on a subset of the ZINC clean leads dataset.28 Other examples of large-scale datasets with results of DFT calculations are Open Catalyst 2020 (OC20) and 2022 (OC22).29,30 These datasets together contain 1.3 million molecular relaxations with results from over 260 million DFT calculations.
Large-scale datasets have allowed to achieve impressive results in the field of Natural Language Processing. One of the key reasons of the success of the Transformer-based models,31 such as BERT32 or GPT-3,33 was the access to a huge training corpus. It has been shown in the domain of medicinal chemistry34 that degradation in the accuracy from the full dictionary to a 30% of the dictionary is significant for disease linking in clinical trials. Apart from the quality increase, bigger and more diverse datasets are important for models robustness. Work of Tutubalina et al.35 elaborates that the generalization ability of machine learning models is influenced by whether the test entity/relation has been presented in the training set.
In this work, we introduce a new large-scale dataset that contains structures and Hamiltonian matrices for about million conformations of about 6 million conformations of about 1 million molecular structures, with electronic properties computed with the Kohn–Sham method. This dataset allows for comparisons between DFT-based models in different settings, in particular generalization tests where the training and test sets contain different molecules. In the way of benchmarking, we adapt several classical and state of the art DFT-based models and compare their results on our dataset, drawing important conclusions about their expressivity, generalization power, and sensitivity to data size and training regimes. The models considered in this work come in two varieties, either estimating the potential energy estimation or predicting the Hamiltonian coefficients.
The paper is organized as follows. Section 2 describes related prior art, including datasets and methods as well as modern AI applications. Section 3 introduces the terminology used throughout this paper and sets up the mathematical foundations. Section 4 describes our new dataset, and Section 5 introduces the models used in our benchmark. Section 6 shows benchmark setup and results and discusses the outcomes of our experimental study, and Section 7 concludes the paper.
The Quantum Machines 9 (QM9) dataset presents molecular structures and properties obtained from quantum chemistry calculations for the first 133885 molecules of the chemical universe GDB-17 database.37 The dataset corresponds to the GDB-9 subset of all neutral molecules with up to nine heavy atoms (CONF), not counting hydrogen. Additionally, the dataset includes 6095 constitutional isomers of C7H10O2. For all molecules, calculated parameters include equilibrium geometries, frontier orbital eigenvalues, dipole moments, harmonic frequencies, polarizabilities, and thermochemical energetics corresponding to atomization energies, enthalpies, and entropies at ambient temperature. These properties have been obtained at the B3LYP/6-31G(2df,p) level of theory. For a subset of 6095 constitutional isomers, these parameters were calculated at the more accurate G4MP2 level of theory.
Along with their Accurate NeurAl networK engINe for Molecular Energies (ANAKIN-ME, or ANI), Smith et al.38 released ANI-1,39 a dataset of non-equilibrium DFT total energy calculations for organic molecules that contains ≈20m molecular conformations for 57462 molecules from the GDB-11 database.40,41 Atomic species are limited to C,N,O atoms (with hydrogens added separately with RDKit), and the number of heavy atoms varies from 1 to 8. All electronic structure calculations in the ANI-1 dataset are carried out with the ωB97x42 density functional and the 6-31 G(d) basis set in the Gaussian 09 electronic structure package.
Quantum-Mechanical Properties of Drug-like Molecules (QMugs) is a data collection of over 665k curated molecular structures extracted from the ChEMBL database.43 Three conformers per compound were generated, and the corresponding geometries were optimized using the GFN2-xTB method,44–46 with a comprehensive array of quantum properties computed at the DFT level of theory using the ωB97X-D functional47 and the def2-SVP Karlsruhe basis set.48
Advances in the field of AI for fundamental problem solving have shown promising results in a series of domains. In particular, Eremin et al.53 recently designed, synthesized and tested a novel quasicrystal with the help of state-of-the-art machine learning models. Yakubovich et al.54 used deep generative models to search for molecules suitable for triplet–triplet fusion with potential applications in blue OLED devices, finding significantly increased performance in terms of generated lead quality. Wan et al.55 applied a hybrid DFT-ML approach to study catalytic activity of materials. Schleder et al.56 used machine learning techniques and DFT to identify thermodynamically stable 2D materials. Recently, Ritt et al.57 elucidated key mechanisms for ion selectivity in nanoporous polymeric membranes by combining first-principles simulations with machine learning. Janet et al.58 developed a ML-based approach for accelerated discovery of transition-metal complexes which allows to evaluate a large chemical compound space in a matter of minutes. Ye et al.59 reviewed recent advances in applying DFT in molecular modeling studies of COVID-19 pharmaceuticals.
ĤΨ = EΨ, |
These single-particle functions are usually defined in a local atomic orbital basis of spherical atomic functions , where |ϕi〉 are the basis functions and cim are the coefficients. As a result, one can represent the electronic Schrödinger equation in matrix form as
Fσcσ = εσScσ, |
Hij = 〈ϕi|Ĥ|ϕj〉, |
S is the overlap matrix:
Sij = 〈ϕi|ϕj〉, |
In matrix form, the single-particle wave function expansion can be represented by using Einstein summation as
Therefore, the density matrix is represented as
Dσij = CσikCσjk |
In DFT, the matrix F corresponds to the Kohn–Sham matrix:
Fσij = Hcσij + Jσij + Vxcij |
In DFT, the total energy of the system (e.g., total energy of a conformation) can be expressed as
One can represent the Hamiltonian matrix in block form:16
Unfortunately, eigenvalues and wave function coefficients are not well-behaved or smooth functions because they depend on atomic coordinates and changing molecular configurations. This problem can be addressed by deep learning architectures that directly define the Hamiltonian matrix.
In this work, we propose a benchmark for both scalar parameter prediction, such as the conformation energy, and prediction of matrix parameters such as the core Hamiltonian and overlap matrices.
For each molecule from the dataset, we have run the conformation generation method from the RDKit software63,64 suite proposed in Wang et al.65 Next, we clustered the resulting conformations with the Butina clustering method,66 finally taking the clusters that cover at least 95% conformations and using their centroids as the set of conformations. This procedure has resulted in 1 to 62 unique conformations for each molecule, with 5340152 total conformations in the full dataset. For each conformation, we have calculated its electronic properties including the energy (E), DFT Hamiltonian matrix (H), and DFT overlap matrix (S) (see the full list in Table 2). All properties were calculated using the Kohn–Sham method67 at ωB97X-D/def2-SVP levels of theory using the quantum-chemical software package Psi4,68 version 1.5. Default PSI4 parameters were used for DFT computations, i.e. Lebedev–Treutler grid with a Treutler partition of the atomic weights, 75 radial points and 302 spherical points, the criterion for the SCF cycle termination was the convergence of energy and density up to 10−6 threshold, integral calculation threshold was 10−12.
We provide several splits of the dataset that can serve as the basis for comparison across different models. First, we fix the training set that consists of 100000 molecules with 436581 conformations and its smaller subsets with 10000, 5000, and 2000 molecules and 38364, 20349, and 5768 conformations respectively; these subsets can help determine how much additional data helps various models. We choose another 100000 random molecules as a structure test set. The scaffold test set has 100000 molecules containing a Bemis–Murcko scaffold from a random subset of scaffolds which are not present in the training set. Finally, the conformation test set consists of 91182 (resp., 10000, 5000, 2000) molecules from the training set with new conformations, numbering in total 92821 (8892, 4897, 1724) conformations; this set can be used for the single-molecule setup.
As part of the benchmark, we provide separate databases for each subset and task and a complete archive with wave function files produced by the Psi4 package that contains quantum chemical properties of the corresponding molecule and can be used in further computations.
A formal comparison of our dataset's parameters with previously available datasets such as QM9, ANI-1, and QMugs is presented in Table 1.
Statistic | QM9 | ANI-1 | QMugs | ∇DFT |
---|---|---|---|---|
a Hamiltonian matrices for QMugs dataset can be calculated from density matrices by one step of DFT cycle. | ||||
Number of molecules | 134k | 57462 | 665k | 1m |
Number of conformers | 134k | 20m | 2m | 5m |
Number of atoms | 3–29 | 2–26 | 4–228 | 8–62 |
Number of heavy atoms | 1–9 | 1–8 | 4–100 | 8–27 |
Atomic species | H, C, N, O, F | H, C, N, O | H, C, N, O, P, S, Cl, F, Br, I | H, C, N, O, Cl, F, Br |
Hamiltonian matrices | No | No | Noa | Yes |
Level of theory | B3LYP/6-31G(2df,p)+G4MP2 | ωB97X/6-31G(d) | ωB97X-D/def2-SVP+GFN2-xTB | ωB97X-D/def2-SVP |
Storage size | 230 Mb | 5.29 Gb | 7 Tb | 100 Tb |
QM9 | DFT + partially G4MP2: rotational constants, dipole moment, isotropic polarizability, HOMO/LUMO/gap energies, electronic spatial extent, zero point vibrational energy, internal energy at 0 K, internal energy at 298.15 K, enthalpy at 298.15 K, free energy at 298.15 K, heat capacity at 298.15 K, Mulliken charges, harmonic vibrational frequencies |
ANI-1 | DFT: total energy |
QMugs | GFN2 + DFT: total, internal atomic and formation energies, dipole, rotational constants, HOMO/LUMO/gap energies, Mulliken partial charges |
GFN2: total enthalpy, total free energy, quadrupole, enthalpy, heat capacity, entropy, Fermi level, covalent coordination number, molecular dispersion coefficient, atomic dispersion coefficients, molecular polarizability, atomic polarizabilities, Wiberg bond orders, total Wiberg bond orders | |
DFT: electrostatic potential, Löwdin partial charges, exchange correlation energy, nuclear repulsion energy, one-electron energy, two-electron energy, mayer bond orders, Wiberg–Löwdin bond orders, total mayer bond orders, total Wiberg–Löwdin bond orders, density/orbital matrices, atomic-orbital-to-symmetry-orbital transformer matrix | |
∇DFT | DFT: electrostatic potential, Löwdin partial charges, exchange correlation energy, nuclear repulsion energy, one-electron energy, two-electron energy, mayer bond orders, Wiberg–Löwdin bond orders, total mayer bond orders, total Wiberg–Löwdin bond orders, density/orbital matrices, atomic-orbital-to-symmetry-orbital transformer matrix, Hamiltonian matrix |
• Conformational energy prediction and
• DFT Hamiltonian prediction
Fig. 2 shows a bird's eye overview of the general architecture for the models that we compare in this work. It consists of four main blocks:
• Inputs: all considered models use atom types, coordinates, or their functions as input.
• Embedding layers: MLP or a single linear layer.
• Interaction layers: this is usually the main part with model-specific architecture.
• Output layers: depending on the model, output layers are designed to convert internal representations into specific desired values.
Given feature representations of n objects Xl = (x1,x2,…,xn) ∈F (at layer l), which are at positions , and a special function Wl that maps R's domain to X's domain, i.e., , the output of the proposed cfconv layer is defined as
The use of cfconv and the overall network architecture design ensure an important property: energy and force predictions are rotationally invariant and equivariant respectively. For instance, in the cfconv layer Wl is a combination of ||ri − rj||, radial basis functions, dense feedforward layers, and a shifted softplus activation function ssp(x) = ln(0.5ex + 0.5).
The model takes nuclear charges and atomic positions as inputs. Nuclear charges are first embedded into 64-dimensional representations, After that, they are processed with an “Interaction” block of layers that includes 3 applications of cfconv (cf.Fig. 3, thus enriching the feature representation with positional information). Then more atom-wise modifications follow to be pooled in a final layer to obtain the energy prediction. Differentiating the energy with respect to atom positions yields the forces, hence force predictions can be and are also added to the loss function.
The resulting method has been tested on three datasets: QM9,37,69 MD17,70 and ISO17 (introduced in the original work15). For more details we refer to Schutt et al.15
DimeNet is inspired by the PhysNet model72 and expands it with message passing and additional directional information. A general overview of the architecture is shown in Fig. 4 (left). The core of DimeNet is the Interaction block: a message mji from atom j to atom i takes into account information about the angles ∠xixjxk obtained via messages from the corresponding atoms k that are neighbors of atom j. Specifically, it is defined as follows:
The Interaction block is visualized in Fig. 5. A message mji is initialized with learnable embeddings h(0)i and h(0)j that depend on the relative position of corresponding atoms (Fig. 4, middle). Finally, the messages are combined together to form the atom embedding
The resulting model is invariant to permutations, translation, rotation, and inversion. Evaluation on QM9 and MD17 shows that DimeNet significantly outperforms SchNet on the prediction of most targets. An extension of this model was later proposed in DimeNet++;73 it further improved performance and reduced computational costs.
Fig. 6 The SchNOrb architecture. On the left: general architecture overview; on the right: the Interaction block in SchNOrb. In addition to the SchNet Interaction block, SchNOrb uses the factorized tensor layer74 to produce pairwise atom features and predict the basis coefficients for the Hamiltonian. |
The inputs for this neural network are the charges Z and position differences ||rij|| (norm of the vector pointing from atom i to atom j); the model also uses normalized directions . The representations Z and ||rij|| are then processed with the SchNet step as described in Section 5.2.
On the next SchNOrb step, SchNet outputs (a vector per atom) are combined with ||rij|| using a factorised tensor layer,74 feedforward layers, shifted softplus, and simple sums. The outputs include: (1) rotationally invariant per-atom embeddings Xl, which are then transformed and aggregated to predict the energy value; (2) embeddings Pl for atom pairs that are multiplied by different powers of directional cosines, aggregated and passed to fully connected layers to predict blocks of the Hamiltonian and overlap matrices; (3) finally, similar to SchNet, forces are predicted via graph differentiation.
The datasets used in this work are based on MD1770 and include water, ethanol, malondialdehyde, and uracil. Reference calculations were performed with Hartree–Fock (HF) and density functional theory (DFT) with the PBE exchange correlation functional. For more details regarding data preparation and augmentation we refer to Schütt et al.16
The main building blocks of PhiSNet are the following: (1) feature representation both for input and intermediate layers is of the form , where F corresponds to feature channels and (L + 1)2 corresponds to all possible spherical harmonics of degree l ∈ {0,…,L}; (2) all layers except the last one apply SE(3)-equivariant operations on features; matrix multiplication, tensor product contractions, and tensor product expansions are mixed together to make equivariant updates for the features of every atom and atom pair; on the last layer, the Hamiltonian and overlap matrices are constructed from pairwise features (Fig. 7).
We have made several minor modifications to the original model implementation of PhiSNet in order to allow it to work with the ∇DFT dataset; in particular, we have applied batchization similar to Pytorch Geometric, where molecules inside a single batch are treated as one molecule with no bonds between atoms of different molecules.
Metrics for energy prediction and prediction of Hamiltonian matrices are reported in Tables 3 and 4 respectively. These results lead us to the following observations.
Model | MAE for energy prediction, × 10−2Eh | |||
---|---|---|---|---|
2k | 5k | 10k | 100k | |
a Training SchNOrb model did not converge for the 10k, and 100k train split, for SchNet model did not converge for the 100k train split. | ||||
Structure test split | ||||
LR | 4.6 | 4.7 | 4.7 | 4.7 |
SchNet | 151.8 | 66.1 | 29.6 | —a |
Dimenet++ | 24.1 | 21.1 | 10.6 | 3.2 |
SchNOrb | 5.9 | 3.7 | 13.3a | —a |
Scaffolds test split | ||||
LR | 4.6 | 4.7 | 4.7 | 4.7 |
SchNet | 126.5 | 68.3 | 27.4 | —a |
Dimenet++ | 21.6 | 20.9 | 10.1 | 3.0 |
SchNOrb | 5.9 | 3.4 | 14.8a | —a |
Conformations test split | ||||
LR | 4.0 | 4.2 | 4.0 | 4.0 |
SchNet | 79.1 | 67.3 | 21.4 | —a |
Dimenet++ | 18.3 | 33.7 | 5.2 | 2.5 |
SchNOrb | 5.0 | 3.6 | 14.5a | —a |
Model | MAE for Hamiltonian matrix prediction, × 10−3Eh | MAE for overlap matrix prediction, × 10−5 | ||||
---|---|---|---|---|---|---|
2k | 5k | 10k | 2k | 5k | 10k | |
a While the relative difference between metrics for SchNOrb and PhiSNet is similar to the one reported by phisnet2021, we believe that there are still some problems with SchNOrb training in the multi-molecule setup, e.g., gradient explosion. | ||||||
Structure test split | ||||||
SchNOrb | 386.5 | 383.4 | 382.0 | 1550 | 1571 | 3610 |
PhiSNet | 7.4 | 3.2 | 2.9 | 5.1 | 4.3 | 3.5 |
Scaffolds test split | ||||||
SchNOrb | 385.3 | 380.7 | 383.6 | 1543 | 1561 | 3591 |
PhiSNet | 7.2 | 3.2 | 2.9 | 5.0 | 4.3 | 3.5 |
Conformations test split | ||||||
SchNOrb | 385.0 | 384.8 | 392.0 | 1544 | 1596 | 3576 |
PhiSNet | 6.5 | 3.2 | 2.8 | 5.1 | 4.6 | 3.6 |
First and foremost, we see that all models in both tasks, except SchNOrb and Linear Regression, benefit from increasing the dataset size. This indicates that even already published models may not have hit the limit of their expressive power and may further benefit from larger scale datasets. We suppose that linear model has almost identical scores trained on train sets of different sizes because of small expressiveness. Training has not converged in the case of SchNOrb model for 10k and 100k splits.
Second, as expected, the models perform better on the conformations test split that contains the same set of structures but with different conformations; in this case, the training set is most similar to the test set so this behaviour is expected. On the other hand, on the structures test split and the scaffolds test split the models show nearly equivalent performance. This may imply that models generalized on different structures automatically generalize on different scaffolds.
Third, interestingly, deep models that were trained on small dataset splits (2k, 5k, and 10k) only to predict energy show results worse than a simple linear regression. On the positive side, the DimeNet++ model trained on the 100k subset performs better, which may imply that the same model trained on the full training set may show much better results. Moreover SchNOrb models trained on 2k and 5k splits perform better than linear regression and other models trained on corresponding splits, which may imply that energy prediction benefits from multi-target (Hamiltonian matrix, overlap matrix and energy) learning.
Fourth, in our setup deep models for energy prediction perform much worse than they do on previously known benchmarks such as QM9 or MD17 (e.g. DimeNet++ has MAE 0.00023Eh on QM973). This may be caused by the diversity of the ∇DFT dataset and small size of splits. The latter point holds for models for predicting Hamiltonian matrices as well; in this case, we see that as an indication that more care needs to be taken in hyperparameter tuning and construction of new architectures.
Results of our experimental evaluation show the ability of modern deep neural networks to generalize, both for energy prediction and estimation of Hamiltonian matrices. We also see that increasing amount of data leads to better metrics, especially in the case of the PhiSNet model. Unfortunately, training with a limited amount of computational resources or small dataset size often leaves deep neural networks undertrained and exhibiting comparatively bad performance. In particular, model errors grow significantly in the multi-molecular setting compared to a single molecule. It still remains a challenge to obtain models that are superior to chemical accuracy.
This journal is © the Owner Societies 2022 |