Establishing the accuracy of density functional approaches for the description of noncovalent interactions in biomolecules

Minho Kim a, Tim Gould b, Dario Rocca a and Sébastien Lebègue *a
aUniversité de Lorraine and CNRS, LPCT, UMR 7019, Vandoeuvre-lès-Nancy 54506, France. E-mail: sebastien.lebegue@univ-lorraine.fr
bQueensland Micro- and Nanotechnology Centre, Griffith University, Nathan, Queensland 4111, Australia

Received 4th August 2020 , Accepted 12th September 2020

First published on 17th September 2020


Biomolecules have complex structures, and noncovalent interactions are crucial to determine their conformations and functionalities. It is therefore critical to be able to describe them in an accurate but efficient manner in these systems. In this context density functional theory (DFT) could provide a powerful tool to simulate biological matter either directly for relatively simple systems or coupled with classical simulations like the QM/MM (quantum mechanics/molecular mechanics) approach. Additionally, DFT could play a fundamental role to fit the parameters of classical force fields or to train machine learning potentials to perform large scale molecular dynamics simulations of biological systems. Yet, local or semi-local approximations used in DFT cannot describe van der Waals (vdW) interactions, one of the essential noncovalent interactions in biomolecules, since they lack a proper description of long range correlation effects. However, many efficient and reasonably accurate methods are now available for the description of van der Waals interactions within DFT. In this work, we establish the accuracy of several state-of-the-art vdW-aware functionals by considering 275 biomolecules including interacting DNA and RNA bases, peptides and biological inhibitors and compare our results for the energy with highly accurate wavefunction based calculations. Most methods considered here can achieve close to predictive accuracy. In particular, the non-local vdW-DF2 functional is revealed to be the best performer for biomolecules, while among the vdW-corrected DFT methods, uMBD is also recommended as a less accurate but faster alternative.


1 Introduction

Simulation is nowadays an indispensable tool to understand and predict the properties and chemical processes of biomolecules.1–7 Often, a trade-off between precision and numerical cost must be made. To simulate systems with several tenths of thousand of atoms, force-field based methods are the only affordable choice. However for smaller systems, quantum-mechanical approaches using density functional theory (DFT) or quantum-embedding methods like QM/MM (quantum mechanics/molecular mechanics)8 can be tackled on modern supercomputers and offer greater predictive accuracy. Moreover, DFT is often used to prepare force fields, as done for instance recently with the use of machine learning,9,10 making its use wider than expected.

However, one persistent problem of density functional theory is the description of van der Waals (vdW) interactions, which are key to the structure and stability of many biological entities, such as π–π base stacking interaction,2,4 base–backbone interaction,3 polypeptide folding1 and amino acid–nucleic acid complex interaction.11,12 Numerous efforts have been made to improve the accuracy of density functional theory in describing quantitatively vdW interactions.13–24 The most straightforward solution is to add a dipole–dipole correction to a local or semi-local exchange–correlation functional. More complex vdW density functionals19–24 introduce a nonlocal exchange–correlation kernel to take into account nonlocal correlation effects. Methods involving many-body interactions15–18 allow the description of collective effects that are missed in other theories.

While the precision of these methods for molecules and solids is now well established,25,26 their performance in the context of biological systems has not yet been comprehensively assessed, except for a single or few categories of biomolecules.27–31 This situation makes choosing a suitable method difficult28 in a biological context. There is thus an urgent need to assess the performance of state-of-the-art methods using an integrated benchmark set of biomolecules.

Accuracy in biological systems is not the only factor that constrains the choice of approximation, however. Increasing computational cost of hybrid functionals and double-hybrid functionals limits their general applicability. Cost issues are especially important in periodic systems, where the Hartree–Fock hole can become highly delocalized which leads to very slow convergence of very expensive calculations. Some functionals, like B3LYP, are limited in scope and perform very poorly in small gap solid state systems.32

These high costs or methodological failures make it difficult to apply many approximations to biological/nanoparticle (bio/nano) interfaces, which mix solid state and molecular physics. Such systems are of increasing interest to researchers because of their potential applications in medical applications. There is thus an urgent need to ensure that GGA approaches (e.g., PBE) that are known to treat both molecular and solid state systems in a consistent fashion can be enhanced to reproduce weak dispersion forces.

In this work, we present an extensive evaluation of several state-of-the-art DFT-related methods used to compute the energy (binding or conformation) for various biomolecular systems ranging from peptides and nucleotide base pairs and backbones to biological inhibitors with macrocycles, as contained in four datasets. Our paper is organized as follows: after a short reminder of the different testsets and methods, our results are presented and discussed, while in the last part we offer our conclusions.

2 Materials and methods

2.1 Selection of benchmark set

In this work, we chose 4 benchmark sets to evaluate the vdW-corrected methods and nonlocal van der Waals density functionals: (i) the DNA base-pair step set,29 which consists of 10 combinations of base-pair steps in B-DNA, (ii) the UpU23 set,28 which consists of the 24 conformations of uracil dinucleotide (UpU) from all known 46 RNA backbone conformations,27 (iii) the PCONF21 set,28 which consists of 11 conformations of Phe-Gly-Gly tripeptide33 and 3 conformations of the ACE-Ala-X-Ala-NME tetrapeptides (ACE is an acetyl group, NME is a methylamide group, and X = Gly, Ser),34 and (iv) some macrocyclic inhibitors in the MPCONF196 set,30 which contains 15 configurations of 8 macrocyclic inhibitor molecules.

2.2 Selection of methods

In this work, we chose six vdW-corrected DFT methods coupled with the PBE functional:35 Grimme's D3 approach (with Becke–Johnson damping),13 the Tkatchenko–Scheffler (TS) method14 and its variants such as TS + SCS,15 MBD@rsSCS,16 MBD@rsSCS/FI17 and uMBD18 methods. We selected PBE due to its availability in almost all codes and its balanced treatment of chemical physics. We also tested four nonlocal vdW density functionals: vdW-DF2,20 rev-vdW-DF2,22 vdW-DF-cx23 and SCAN + rVV10.24 Our results are also compared with the hybrid36–39 and meta-GGA40–43 functional data reported in earlier publications.28–30

We chose PBE as a base functional due to its high speed and versatility in treating solid state and molecular systems. This versatility is not shared by methods like B3LYP which typically outperform PBE in molecules but perform very poorly in solids.32 Poor performance in solids means that otherwise successful molecular approximation cannot be used to treat the difficult bio/nano systems that will be of increasing scientific interest.

The vdW-corrected DFT methods compensate for the lack of long range correlation in local or semi-local DFT by simply adding a dispersion correction, Edisp to the DFT energy:

 
Etotal = EDFT + Edisp(1)
where EDFT is the Kohn–Sham DFT energy and the Edisp is the dispersion correction energy provided by the vdW-corrected DFT methods. In pairwise models such as the D3 or the TS methods, Edisp follows the form:
 
image file: d0cp04137h-t1.tif(2)
where CABn is the averaged n-th order dispersion coefficient for an atom pair A and B, RAB is the corresponding internuclear distance, R0,A and R0,B are the van der Waals radii of atoms A and B, fdamp is a damping function for preventing divergence as R → 0, and sn is a global scaling parameter for the n-th order dispersion energy.

While the pairwise methods are successful for small molecules and molecular crystals, for larger systems many-body interactions become significant.44 The many-body dispersion (MBD) method15 provides an efficient way to evaluate higher-order many-body interactions based on the coupled fluctuating dipole model (CFDM),45 where atoms are depicted as a collection of coupled quantum harmonic oscillators and whose Hamiltonian is written as:

 
image file: d0cp04137h-t2.tif(3)
where ωp and χp are the frequency and displacement from equilibrium of an oscillator p having an atomic polarizability αp, and Tpq is a dipole–dipole interaction tensor. The recent MBD@rsSCS approach16 couples the CFDM Hamiltonian for long-range many-body effects and the self-consistent screening (SCS) method15 at short range based on a range separation scheme. The MBD@rsSCS/FI17 is a further improvement for ionic systems by employing a fractionally ionic (FI) approach. Then uMBD replaces the CFDM Hamiltonian with a point-dipole interaction with a coupled fluctuating smeared-out dipole model (CFSDM) using a smeared out dipole potential for a better description of metals as well as of molecules.

A nonlocal vdW density functional, on the other hand, attempts to design an explicit expression of the nonlocal correlation, which has the form of

 
image file: d0cp04137h-t3.tif(4)
where the kernel, ϕ(r,r′), is given as a function of r and r′. The full exchange–correlation energy of a nonlocal vdW density functional is then evaluated as
 
Exc[n] = Ex[n] + E0c[n] + Enlc[n](5)
where Enlc is the long-range nonlocal correlation term that contributes to van der Waals interactions, E0c is also nonlocal but approaches the LDA (local density approximation) in the slowly varying density limit, and Ex[n] is a GGA exchange energy functional. Among the functionals we tested, vdW-DF2,20 rev-vdW-DF222 and vdW-DF-cx23 stem from the vdW-DF,19 while employing PW86R,46,47 B86R22,48 and LV-PW86r23 as an exchange functional, Ex[n]. On the other hand, SCAN + rVV1024 takes SCAN meta-GGA49 as a semilocal exchange–correlation functional, E0xc = Ex + E0c, and rVV1021 for nonlocal correlation, Enlc.

2.3 Computational details

We assessed the performance of PBE, D3, TS, TS/HI, TS + SCS, MBD@rsSCS, vdW-DF2, rev-vdW-DF2 and SCAN + rVV10 using the Vienna Ab initio Simulation Package (VASP 5.4.4),50–52 of MBD@rsSCS/FI using VASP 5.4.1 with a patch that is available by request to the authors of the original publication,17 of uMBD using the code available in GitHub (https://github.com/minho-amical/uMBD.git), and of vdW-DF-cx with a patch by Björkman.53 Single point calculations (the geometries were kept identical to the ones used for the reference calculations) were performed with an energy cutoff of 1000 eV and the projector augmented-wave method.54 The size of the cell was set to 30 × 30 × 30 Å3 except for the macrocyclic inhibitors of the MPCONF196 set, for which it was set to a box with an 8 Å wide vacuum surrounding the molecule. The total energy was converged until the difference of the last two electronic steps becomes less than 10−7 eV. Accuracy of the methods was evaluated in terms of mean absolute deviation (MAD) and the normalized mean absolute deviation (NMAD), which is the MAD divided by the average of absolute values of energy in a set, image file: d0cp04137h-t4.tif.

3 Results

3.1 DNA base-pair step set

The DNA base-pair step set29 consists of 10 B-DNA base-pair steps, which have 20 intrastrand inter-base pairs and 20 interstrand inter-base pairs. This set highlights the intermolecular noncovalent interaction between aromatic molecules. The reference binding energies of inter-base pairs and base-pair steps were calculated29 by using DLPNO-CCSD(T) in the complete basis set (CBS) limit.

First, we evaluated the accuracy of the different methods for individual pair-stacking energies of inter-base pairs, ΔEstack,ij, in terms of intrastrand sandwich stacking and interstrand parallel-displaced stacking as shown in Fig. 1A. PBE shows the largest error (MAD = 4.91 kcal mol−1, see Fig. 1B) among the tested methods, and a high NMAD (0.99) close to unity indicates that the error of PBE is as large as the average binding energy of the inter-base pairs. It is notable that PBE shows 4-fold poorer estimation for intrastrand stacking interaction (MAD = 6.40 kcal mol−1) than for interstrand interaction (MAD = 1.42 kcal mol−1) as shown in Fig. S1 (ESI), indicating that the errors of PBE mostly are attributed to the lack of dispersion-type attraction between aromatic rings. The lack of dispersion in PBE also results in its underbinding nature in the positive mean deviation (MD = +3.91 kcal mol−1). The meta-GGA functionals, M06-2X and M11, without dispersion correction also show large errors (MAD = 1.02 and 1.81 kcal mol−1)29 (Table S1, ESI), since their kinetic energy dependence of nonlocal correlation still lacks long-range correlation as discussed before.55


image file: d0cp04137h-f1.tif
Fig. 1 (A) Structure of (top) intrastrand and (bottom) interstrand DNA inter-base pairs in the DNA base-pair step set. (B) (left) Mean absolute deviations (MADs) and mean deviations (MDs) and (right) normalized mean absolute deviations (NMADs) of the tested methods for predicting pairwise stacking energies of the DNA inter-base pairs.

The D3, TS and TS + SCS effectively correct the lack of dispersion in PBE and reduce the MAD to around 0.42 kcal mol−1, which is nine times smaller than PBE. TS and TS + SCS slightly overestimate 2-body dispersion-type attraction as shown in their negative MDs (MD = −0.41 kcal mol−1 by both of TS and TS + SCS) because of the lack of many-body screening, while D3 shows mild underbinding trend (MD = +0.34 kcal mol−1).

The many-body approaches further improve the binding energies of interstrand parallel displaced stacking by including repulsive many-body interaction (MD = −0.19 kcal mol−1 by MBD@rsSCS, MD = −0.15 kcal mol−1 by MBD@rsSCS/FI, and MD = −0.11 kcal mol−1 by uMBD), while they incur somewhat underestimated binding of intrastrand sandwich stacking structures (MD = +1.29 kcal mol−1 by MBD@rsSCS, MD = +1.20 kcal mol−1 by MBD@rsSCS/FI, and MD = +0.79 kcal mol−1 by uMBD) as shown in Fig. S1 (ESI). Overall, the MBD@rsSCS (MAD = 0.49 kcal mol−1) and MBD@rsSCS/FI (MAD = 0.47 kcal mol−1) show comparable accuracy with the pairwise methods, while only uMBD (MAD = 0.31 kcal mol−1) shows better accuracy than any other vdW-corrected DFT methods. The pairwise D3 method also shares a slightly underbinding trend with the many-body methods, since it employs a unique scaling of C6 as a function of coordination number.

Most of the nonlocal vdW density functionals further enhance the accuracy of pairwise stacking energies by a factor of 2 in comparison with the vdW-corrected DFT methods. Among the tested methods, rev-vdW-DF2 shows the best MAD in overall (MAD = 0.22 kcal mol−1), which is comparable to the best of hybrid and long-range corrected hybrid functional, such as B3LYP-D3(BJ) (MAD = 0.16 kcal mol−1) and ωB97M-V (MAD = 0.18 kcal mol−1)29 while vdW-DF2 (MAD = 0.25 kcal mol−1) and SCAN + rVV10 (MAD = 0.23 kcal mol−1) are also comparable to rev-vdW-DF2. It is notable that only vdW-DF-cx (MAD = 0.44 kcal mol−1) shows an error as large as the vdW-corrected DFT methods, and it mostly overbinds the inter-base pairs as shown in its negative MD (MD = −0.44 kcal mol−1).

Next, we compare the accuracy of the methods for the 10 B-DNA base-pair steps based on their full four-body stacking energy, ΔE4stack shown in Fig. 2 as well as its pairwise stacking contribution, image file: d0cp04137h-t5.tif, and 4-body nonadditivity contribution, ΔEABCD = ΔE4stack − ΔEstack shown in Fig. S2 (ESI).


image file: d0cp04137h-f2.tif
Fig. 2 (A) Structure of a DNA base-pair step in the DNA base-pair step set. (B) (left) Mean absolute deviations (MADs) and mean deviations (MDs) and (right) normalized mean absolute deviations (NMADs) of the tested methods for predicting the four-body stacking energies of the DNA base-pair step set.

As shown in Fig. 2B, PBE shows unacceptably large errors for ΔE4stack (MAD = 15.82 kcal mol−1). Based on the decomposition of ΔE4stack, most of the PBE errors can be attributed to the pairwise stacking contributions, ΔEstack (MAD = 15.64 kcal mol−1), where additive pairwise dispersion energy are dominant. The vdW-corrected DFT methods reduce the errors by factor of 8 to 10, and uMBD shows the best performance (MAD = 1.43 kcal mol−1) among them. It is notable that pairwise TS and TS + SCS equally overestimate the attraction in the pairwise stacking contribution, ΔEstack (MD = −1.65 kcal mol−1 by TS and −1.64 kcal mol−1 by TS + SCS), while TS appears to be more repulsive ΔEABCD (MD = +0.18 kcal mol−1 by TS and −0.13 kcal mol−1 by TS + SCS) as shown in Fig. S2 (ESI). Therefore, we can conclude that the long-range electrodynamic screening effect becomes more significant from the 2-body pairwise stacking to 4-body stacking, and this contributes to ΔEABCD. The D3 method, on the other hand, shows repulsive character in both pairwise stacking and 4-body nonadditive contributions, which is also a shared feature with many-body methods as in the inter-base pairs.

Many-body methods have a more repulsive character than the pairwise methods, showing an underestimated attraction in ΔEstack (MD = +1.64 kcal mol−1 by MBD@rsSCS and +1.69 kcal mol−1 by MBD@rsSCS/FI) and an overestimated repulsion in ΔEABCD (MD = +0.61 kcal mol−1 by MBD@rsSCS and +0.63 kcal mol−1 by MBD@rsSCS/FI) as shown in Fig. S2 (ESI). As a result, they tend to more underbind the base-pair step than the pairwise methods, giving MD = +2.02 kcal mol−1, +2.08 kcal mol−1, and +1.43 kcal mol−1 by MBD@rsSCS, MBD@rsSCS/FI, and uMBD respectively. By replacing the point-dipole CFDM to the smeared-out dipole CFSDM, uMBD mostly corrects the short-to-moderate range interaction of ΔEstack (MD = +1.06 kcal mol−1 by uMBD), while having a comparable MD (+0.60 kcal mol−1) for ΔEABCD with other many-body methods.

Most of the nonlocal density functionals further improve the accuracy, and rev-vdW-DF2 shows the best performance (MAD = 0.15 kcal mol−1). vdW-DF2 and SCAN + rVV10 show mild errors (MAD = 0.96 kcal mol−1 by vdW-DF2 and 0.69 kcal mol−1 by SCAN + rVV10), and only vdW-DF-cx shows as large error (MAD = 1.73 kcal mol−1) as the vdW-corrected DFT methods. Based on the decomposition of ΔE4stack in Fig. S2 (ESI), the errors in vdW-DF-cx come from overestimated attraction in the pairwise stacking energies (MD = −1.74 kcal mol−1), which was also pointed out from the individual inter-base pair subset as discussed above. On the other hand, rev-vdW-DF2 underestimates the 4-body nonadditive repulsion (MD = −0.49 kcal mol−1 for ΔEABCD) and pairwise stacking attraction (MD = +0.86 kcal mol−1 for ΔE4stack), leading to an error cancellation in total.

Overall, the errors of the tested methods mostly show extensive properties. The MADs of the tested methods increase from 2-body inter-base pairs to 4-body base-pair steps by a factor of 3 to 4, but their NMADs remain comparable. It is notable that only rev-vdW-DF2 shows a marginal dependence on the size of system with comparable MADs for the inter-base pairs (0.22 kcal mol−1) and the base-pair steps (0.15 kcal mol−1).

3.2 Conformer sets

We next turn to various benchmark sets which compare conformers. Comparing (and ranking) the ability of approximations to predict the energies of conformations throws up an important interpretative issue: which energies should one use? Answering this question depends on context. Sometimes it is important to compare against the average conformation energy (i.e., image file: d0cp04137h-t6.tif), when one is interested in how a method performs for the full range of conformations. Sometimes, it is important to look at energies relative to the most stable conformer (i.e., ΔElowesti = EiE0), e.g., when understanding which state will dominate after a long period of time. Finally, it is sometimes important to compare against the thermal average energy (i.e., ΔEthermali = EiEt, where image file: d0cp04137h-t7.tif) when one is interested in understanding how systems behave at room temperature (here, β−1 = 0.596 kcal mol−1 at room temperature).

3.3 UpU23 set

The UpU23 set28 consists of relative energies of 24 uracil dinucleotides (Fig. S3, ESI) chosen from 46 known conformations of the RNA backbone.27 All of the uracil dinucleotides in the UpU23 set have (−1) net charge from a phosphate group which poses challenges for codes using periodic boundary conditions. Thus, the reliability of the DFT-PBE energies based on a plane-wave basis set reported in this work was tested against energies based on a localized basis. Results differed by 0.35 kcal mol−1 at most – a minor effect compared to what is considered as chemical accuracy (∼1 kcal mol−1) and to the scale of relative conformation energies, which are larger than 2 kcal mol−1. The reference conformation energies were obtained from DLPNO-CCSD(T)/CBS calculations. As discussed above and in the Appendix of ESI, the conformation energies are calculated in three way: relative to (i) the average conformation energy, (ii) the energy of the most stable conformer, which is the 2p conformation in the UpU23 set, and (iii) the thermal averaged energy. More details are given as ESI.

As shown in Fig. 3, PBE shows the largest error among all of the tested methods in case of both vs. the average conformation energy and vs. the most stable conformer. Both of the vdW-corrected DFT methods and the nonlocal vdW density functionals largely reduces the MAD and NMAD. However, in case of vs. thermal average energy (Fig. 3D), pairwise TS and TS + SCS have comparable or larger MAD (1.53 and 1.85 kcal mol−1, respectively) than PBE (MAD = 1.84 kcal mol−1), since they predict the 5z conformation as the most stable and far more stable than the 2p conformation by 1.72 kcal mol−1 and 1.90 kcal mol−1, respectively. All of the nonlocal density functionals also predict the 5z conformation as the most stable but more stable than the 2p conformation by a difference within 1 kcal mol−1, giving smaller MAD than TS and TS + SCS.


image file: d0cp04137h-f3.tif
Fig. 3 (A) Structure of uracil dinucleotide in the UpU23 set. (B)–(D) Mean absolute deviations (MADs) and normalized mean absolute deviations (NMADs) of the tested methods for predicting conformation energies relative to (B) the average conformation energy of each method, (C) the energy of the most stable conformer based on CCSD(T), and (D) the thermal average energy based on each method.

Regardless of the definition of the conformation energy, rev-vdW-DF2 generally shows the best performance in the UpU23 set (MAD = 0.32 kcal mol−1vs. average conformation energy, 0.38 kcal mol−1vs. the most stable conformer, and 0.31 kcal mol−1vs. thermal average energy, respectively), which has 5-fold to 6-fold smaller MADs than PBE, and better than any other (long-range-corrected) hybrid and meta-GGA functionals28 as shown in Table S2 (ESI). uMBD and SCAN + rVV10 also show comparable MADs (MAD = 0.35, 0.37, and 0.39 kcal mol−1 by uMBD and 0.34, 0.40 and 0.34 kcal mol−1 by SCAN + rVV10, respectively) with rev-vdW-DF2.

Among the vdW-corrected DFT methods, uMBD shows the best performance over the tested methods and hybrid and meta-GGA functionals in Table S2 (ESI), regardless of the definition of conformation energies, while other many-body methods, MBD@rsSCS/FI and MBD@rsSCS, are also as accurate as uMBD within a MAD difference of 0.1 kcal mol−1 (MAD = 0.42, 0.43 and 0.43 kcal mol−1, respectively, by MBD@rsSCS/FI, and MAD = 0.44, 0.45 and 0.45 kcal mol−1, respectively, by MBD@rsSCS). The pairwise D3 method also shows a comparable accuracy (MAD = 0.46, 0.50 and 0.47 kcal mol−1, respectively) to the many-body methods. Many-body vdW-corrected DFT methods and D3 predict the 2p conformation as the most stable in agreement with CCSD(T) (see Fig. S4, ESI), and therefore they show 4 times smaller MAD than TS and TS + SCS in terms of thermal average energy. It is also notable that the D3 correction combined with M06 deteriorates its MAD regardless of the definition of the conformation energy, due to the problem of this functional with the prediction of nonequilibrium geometries, as discussed before.28

3.4 PCONF21 set

The PCONF21 set28 consists of relative energies of (i) the most stable 11 RI-MP2/cc-pVDZ configurations of Phe-Gly-Gly tripeptides from the 60 most stable SCC-DFTB-D geometries, which correspond to the 99, 412, 114, 357, 224, 215, 470, 444, 300, 691 and 366th local minima at the AMBER scale among 1500 structures in the entire set by Řeha and co-workers33 and (ii) 5 ACE-Ala-Gly-Ala-NME tetrapeptide conformers and 5 ACE-Ala-Ser-Ala-NME tetrapeptide conformers, which are optimized at B3LYP/6-31G** with constraint of typical backbone dihedral angles56 (Fig. 4A and Fig. S5, ESI). The reference energies were obtained by single point calculations at the DLPNO-CCSD(T)/TightPNO/CBS(aug-cc-pVTZ/aug-cc-pVQZ)28 level.
image file: d0cp04137h-f4.tif
Fig. 4 (A) Structures of (top) Phe-Gly-Gly tripeptide and (bottom) ACE-Ala-X-Ala-NME (X = Gly and Ser) in the PCONF21 set. (B)–(D) Mean absolute deviations (MADs) and normalized mean absolute deviations (NMADs) of the tested methods for predicting conformation energies relative to (B) the average conformation energy of each method, (C) the energy of the most stable conformer based on CCSD(T), and (D) the thermal average energy based on each method.

As shown in Fig. 4B–D, regardless of the definition of conformation energies, PBE shows the largest error among the tested methods (MAD = 1.62 kcal mol−1vs. average conformation energy, 3.66 kcal mol−1vs. the most stable conformer, and 2.32 kcal mol−1vs. thermal averaged energy, respectively), comparable to B3LYP as shown in Table S3 (ESI). On the other hand, nonlocal vdW density functionals show a better performance than the vdW-corrected DFT methods. Interestingly, the pairwise TS (MAD = 0.93, 1.25 and 1.08 kcal mol−1, respectively) and TS + SCS methods (MAD = 0.92, 1.18 and 0.98 kcal mol−1, respectively) are slightly better than D3 (MAD = 0.94, 1.43 and 1.23 kcal mol−1, respectively) as well as many-body MBD@rsSCS (MAD = 0.96, 1.57 and 1.30 kcal mol−1, respectively) and MBD@rsSCS/FI (MAD = 0.94, 1.54 and 1.29 kcal mol−1, respectively). The uMBD also shows comparable or slightly larger MAD (0.89, 1.32 and 1.17 kcal mol−1, respectively) than the pairwise methods. D3 shows a better performance when coupled with ωB97X, while it worsens the accuracy of the M06 and M06-2X functionals28 (see Table S3, ESI). Nonlocal vdW density functionals show comparable accuracy with the hybrid meta-GGA M06 and M06-2X functionals. Among the nonlocal vdW density functionals, vdW-DF2 shows the best performance (MAD = 0.35, 0.43 and 0.35 kcal mol−1, respectively), and SCAN + rVV10 (MAD = 0.53, 0.58 and 0.56 kcal mol−1, respectively) is the second best. rev-vdW-DF2 (MAD = 0.57, 0.72 and 0.67 kcal mol−1, respectively) and vdW-DF-cx (MAD = 0.60, 0.80 and 0.69 kcal mol−1, respectively) have slightly larger errors than the other nonlocal functionals, but still better than the vdW-corrected DFT methods.

According to the comparison of individual relative energies of Phe-Gly-Gly tripeptides in Fig. S6 (ESI), the vdW-corrected DFT methods and most of the nonlocal van der Waals density functionals share common features. To discuss this, we can categorize the Phe-Gly-Gly conformations into three groups according to their structures in Fig. S5A (ESI). (i) Stretched subset: 366, 300 and 691st local minima structures in the AMBER 1500 conformation set,33 which has large separation between phenyl group and carboxylic end, and thereby their pairwise dispersion interaction is the smallest among the 11 conformations, making them less stable than 99 based on the vdW-corrected DFT methods. (ii) Folded subset: 444, 215, 357, 114 and 224th local minima structures in the AMBER 1500 conformation set, which have the most folded conformation with a common hydrogen bond between the OH of carboxylic acid and the O of the middle Gly unit (green dashed line in Fig. S5A, ESI) that gives them a more folded carboxylic end than 99, 412 and 470 and thereby predicted to have the strongest pairwise attraction-type dispersion interaction. (iii) Intermediate subset: 99, 412 and 470th local minima structures in the AMBER 1500 conformation set, which have comparably folded conformations with the Folded subset, but they don’t make a hydrogen bond at the carboxylic end and thereby predicted to be less stable than the Folded subset based on most of the tested methods except for vdW-DF2 and SCAN + rVV10. For the Folded and Intermediate subset, their results are well correlated with CCSD(T) in case of vs. the most stable conformer, while for the Folded subset, their results are well correlated with CCSD(T) in case of vs. thermal average energy. vdW-DF2 is the only method that shows a unique trend strongly correlated with the CCSD(T) results and the smallest MAD as mentioned before. SCAN + rVV10 shows the similar trend with vdW-DF-cx and rev-vdW-DF2, but its prediction of the most stable conformer, 99, is slightly better than them and thereby leading it to be the second best in the PCONF21 set.

3.5 Cyclic inhibitors in the MPCONF196 set

From the MPCONF196 set,28,30 we selected 8 macrocyclic inhibitors – CAMVES, CHPSAR, COHVAW, Cpd_B, Cpd_A, POXTRD, SANGLI and YIVNOG, which are shown in Fig. 5A. The reference energies are calculated by CCSD(T)/CBS in case of POXTRD and a composite scheme of MP2-F12/DZ + DLPNO(t)-CCSD(T) otherwise. More details on the calculation of the reference energies can be found in the previous work of GMTKN55.28
image file: d0cp04137h-f5.tif
Fig. 5 (A) Structures of macrocyclic inhibitors in the MPCONF196 set. (B)–(D) Mean absolute deviations (MADs) and normalized mean absolute deviations (NMADs) of the tested methods for predicting conformation energies relative to (B) the average conformation energy, (C) the energy of the most stable conformer based on CCSD(T) and (D) the thermal average energy based on each method.

As shown in Fig. 5, PBE shows the largest error for predicting conformation energies relative to the average conformation energy (MAD = 1.78 kcal mol−1 and NMAD = 0.35), the energy of the most stable conformer (MAD = 2.55 kcal mol−1 and NMAD = 0.21), and the thermal average energy (MAD = 2.04 kcal mol−1 and NMAD = 0.16).

Among the vdW-corrected DFT methods, the D3 method shows the smallest error (MAD = 0.75 kcal mol−1 and NMAD = 0.16 vs. average conformation energy, MAD = 1.12 kcal mol−1 and NMAD = 0.09 vs. the most stable conformer, and MAD = 1.04 kcal mol−1 and NMAD = 0.09 vs. thermal average energy, respectively), which is also comparable to the many-body MBD@rsSCS, MBD@rsSCS/FI and uMBD (MAD = 0.79, 1.16 and 1.09 kcal mol−1 by MBD@rsSCS, MAD = 0.79, 1.15 and 1.08 kcal mol−1 by MBD@rsSCS/FI, and MAD = 0.84, 1.21 and 1.15 kcal mol−1, respectively) within a MAD difference of 0.1 kcal mol−1. Both of TS and TS + SCS shows larger errors (MAD = 1.08, 1.43 and 1.35 kcal mol−1 by TS and MAD = 1.02, 1.34 and 1.27 kcal mol−1 by TS + SCS, respectively) than any other vdW-corrected DFT methods and nonlocal vdW density functionals, which are attributed to their lack of many-body interaction. The M06-2X and M06L functionals show comparable accuracy with vdW-corrected DFT methods,28 while the inclusion of the D3 correction in these functionals again slightly deteriorates their accuracy as shown in Table S4 (ESI). On the other hand, B3LYP with D3 correction can improve its accuracy (MAD = 0.83 kcal mol−1 and NMAD = 0.06 vs. average conformation energy, MAD = 0.57 kcal mol−1 and NMAD = 0.13 vs. the most stable conformer, and MAD = 0.78 kcal mol−1 and NMAD = 0.06 vs. thermal average energy, respectively) more than any other functionals considered in this work.

Among the nonlocal vdW density functionals, rev-vdW-DF2 shows the best performance (MAD = 0.61, 0.93 and 0.88 kcal mol−1, respectively) and vdW-DF-cx also shows a slightly better accuracy (MAD = 0.64, 0.93 and 0.87 kcal mol−1) than the vdW-corrected DFT methods. vdW-DF2 shows a comparable accuracy (MAD = 0.72, 1.14 and 1.08 kcal mol−1) with the many-body vdW-corrected DFT methods. It is notable that only SCAN + rVV10 shows larger MAD than most of the vdW-corrected DFT methods when the conformer energies are defined with respect to the energy of the most stable conformer and the thermal average energy. As shown in Fig. S8 (ESI), the large MAD of SCAN + rVV10 is attributed to the conformations of SANGLI, where SCAN + rVV10 overestimates the stability of the most state conformer as shown in Fig. S9 (ESI).

4 Discussion

In this work, we evaluated the accuracy of the state-of-the art vdW-corrected DFT methods and nonlocal vdW density functionals for different biomolecular systems – polynucleotide base pairs and backbone fragments, peptides, and macrocyclic inhibitors based on the comparison with highly accurate CCSD(T) and DLPNO-CCSD(T) calculations and some hybrid and meta-GGA functionals. Overall, both vdW-corrected DFT methods and nonlocal vdW density functionals highly improve the accuracy of DFT-PBE, and the best functionals among them approach comparable accuracy to that of higher-level density functional approaches.

rev-vdW-DF2 works better than the other tested functionals in most of benchmark sets except for PCONF21 set, where vdW-DF2 shows unique accuracy over other methods, and it also has comparable MAD with the best of hybrid and meta-GGA functionals within a 0.3 kcal mol−1 difference. Of the vdW-corrected DFT methods, the pairwise D3 and the many-body uMBD method individually reach comparable accuracy to the best vdW-corrected DFT method regardless of the systems. Including a dispersion correction like in vdW-corrected DFT methods mostly improves the accuracy of PBE and hybrid functionals, while it makes worse in case of hybrid meta-GGA functionals, indicating that the vdW-corrected DFT methods require a proper choice of a base functional for further improvements.

The radar plot of the MADs in Fig. 6 illustrates how uniformly a method conserves its accuracy for diverse benchmark sets. According to the radar plots, vdW-corrected DFT methods improves the accuracy of DFT-PBE comparably to each other, but the pairwise TS and TS + SCS show as large error of conformation energy relative to the thermal average energy as DFT-PBE in the UpU23 set. The nonlocal vdW density functionals show further improvements than vdW-corrected methods. While the vdW-corrected DFT methods show larger MADs than chemical (∼1 kcal mol−1) accuracy in 2 to 3 benchmark sets, most of the nonlocal vdW density functionals reach chemical accuracy except for at most 1 benchmark set.


image file: d0cp04137h-f6.tif
Fig. 6 (left) The radar plots of the mean absolute deviations (MADs) of the tested methods in the individual benchmark sets of this work and (right) the weighted total mean absolute deviation (WTMAD-2) of the tested methods in the overall benchmark set of this work and according to the conformation energy relative to (A) the average conformation energy. (B) the energy of the most stable conformer, and (C) the thermal average energy, respectively. The y axis is scaled as non-linear way to clarify the radar plots of accurate methods.

To comprehend the accuracy of the tested methods in the overall benchmark set, we introduce the weighted total mean absolute deviation (WTMAD-2), which is defined in the previous work on the GMTKN55 benchmark.28 Since DNA inter-base pair and DNA base-pair step have different energy scales, the WTMAD-2 was averaged over 5 benchmark sets – the DNA inter-base pairs, the DNA base-pair steps, the UpU23 set, the PCONF21 set and the macrocycles in MPCONF196 set.

According to the WTMAD-2, the nonlocal vdW density functionals have 4.2 to 6.5 times smaller WTMAD-2's than PBE, which are always better than vdW-corrected DFT methods and fall into the chemical accuracy (1 kcal mol−1) regardless of the definition of conformation energy. Among them, vdW-DF2 and rev-vdW-DF2 show the best accuracy (WTMAD-2 = 0.78, 0.74 and 0.76 kcal mol−1vs. average conformation energy, vs. the most stable conformer and vs. thermal average energy, respectively, by vdW-DF2 and WTMAD-2 = 0.82, 0.77 and 0.79 kcal mol−1, respectively, by rev-vdW-DF2), though all of nonlocal functionals have comparable WTMAD-2 within 0.25 kcal mol−1. As shown in the radar plot of Fig. 6 and the MADs in Fig. 1–5, vdW-DF2 and rev-vdW-DF2 have comparable accuracy to each other, but more specifically, vdW-DF2 show better accuracy in PCONF21 set, while rev-vdW-DF2 works better otherwise. SCAN + rVV10 also shows comparable accuracy (WTMAD-2 = 0.85, 0.84 and 0.86 kcal mol−1, respectively) with vdW-DF2 and rev-vdW-DF2, while vdW-DF-cx shows slightly larger WTMAD-2 (0.97, 0.97 and 0.96 kcal mol−1, respectively) yet still smaller than the vdW-corrected DFT methods.

The vdW-corrected DFT methods show slightly larger WTMAD-2 than those of the nonlocal functionals as well as the chemical accuracy, yet they still have reasonable accuracy that is 2.6 to 4.0 times better than DFT-PBE. Among the vdW-corrected DFT methods, uMBD shows the smallest WTMAD-2 (1.20, 1.21 and 1.25 kcal mol−1, respectively), while the WTMAD-2's of other vdW-corrected DFT methods are also comparable within 0.24 kcal mol−1.

5 Conclusion

In conclusion, both vdW-corrected DFT methods and nonlocal vdW density functionals display noticeable improvements from standard DFT-GGA for the description of vdW interactions in biomolecules. We focused on PBE because it offers a balanced and efficient description of molecules and solids, in contrast to some other popular approximations. It is thus likely to work for computational modeling that, for example, explores interactions of biomolecules with nanomaterials.57,58

Of the tested methods, vdW-DF2 shows the best performance for the relative energetics of conformers over other state-of-the-art vdW methods and higher rungs of density functionals. Among the vdW-corrected DFT methods, uMBD and D3 show a comparable accuracy to the nonlocal vdW density functionals for biomolecular interactions. It should be noted, however, that D3 does not perform well in metals and some other solids,18,59,60 meaning caution should be advised when using it to study interactions between biomolecules and solids. In this regard, nonlocal vdW density functionals seem to perform reasonably,61,62 although their use for large systems remain limited.

Overall, we anticipate that our findings will be helpful to researchers using DFT to study the interactions between biomolecules, and especially the dynamics of solvent-biomolecules systems.6 It should also be useful for developing machine-learned force fields,63 where good treatment of all interactions is important.

Conflicts of interest

There are no conflicts to declare.

Acknowledgements

This work was performed using the HPC Mesocenter “Explor” of the University of Lorraine. The authors acknowledge financial support through the COMETE project (COnception in silico de Matériaux pour l’EnvironmenT et l’Energie) cofunded by the European Union under the program “FEDER-FSE Lorraine et Massif des Voges 2014–2020”.

Notes and references

  1. A. Tkatchenko, M. Rossi, V. Blum, J. Ireta and M. Scheffler, Phys. Rev. Lett., 2011, 106, 118102 Search PubMed .
  2. J. Šponer, J. E. Šponer, A. Mládek, P. Jurečka, P. Banáš and M. Otyepka, Biopolymers, 2013, 99, 978–988 Search PubMed .
  3. M. Chawla, E. Chermak, Q. Zhang, J. M. Bujnicki, R. Oliva and L. Cavallo, Nucleic Acids Res., 2017, 45, 11019–11032 Search PubMed .
  4. O. B. Ol’ha, K. S. Tsiupa and D. M. Hovorun, Sci. Rep., 2018, 8, 10371 Search PubMed .
  5. P. K. Biswas and S. Chakraborty, Nucleic Acids Res., 2019, 47, 2757–2765 Search PubMed .
  6. M. Stöhr and A. Tkatchenko, Sci. Adv., 2019, 5, eaax0024 Search PubMed .
  7. V. Genna, M. Marcia and M. D. Vivo, J. Am. Chem. Soc., 2019, 141, 10770–10776 Search PubMed .
  8. L. O. Jones, M. A. Mosquera, G. C. Schatz and M. A. Ratner, J. Am. Chem. Soc., 2020, 142, 3281–3295 Search PubMed .
  9. A. Lunghi and S. Sanvito, Sci. Adv., 2019, 5, eaaw2210 Search PubMed .
  10. R. Galvelis, S. Doerr, J. M. Damas, M. J. Harvey and G. De Fabritiis, J. Chem. Inf. Model., 2019, 59, 3485–3493 Search PubMed .
  11. N. M. Luscombe, R. A. Laskowski and J. M. Thornton, Nucleic Acids Res., 2001, 29, 2860–2874 Search PubMed .
  12. S. Jones, D. T. Daley, N. M. Luscombe, H. M. Berman and J. M. Thornton, Nucleic Acids Res., 2001, 29, 943–954 Search PubMed .
  13. S. Grimme, S. Ehrlich and L. Goerigk, J. Comput. Chem., 2011, 32, 1456–1465 Search PubMed .
  14. A. Tkatchenko and M. Scheffler, Phys. Rev. Lett., 2009, 102, 073005 Search PubMed .
  15. A. Tkatchenko, R. A. DiStasio Jr, R. Car and M. Scheffler, Phys. Rev. Lett., 2012, 108, 236402 Search PubMed .
  16. A. Ambrosetti, A. M. Reilly, R. A. DiStasio Jr and A. Tkatchenko, J. Chem. Phys., 2014, 140, 18A508 Search PubMed .
  17. T. Gould, S. Lebègue, J. G. Ángyán and T. Bučko, J. Chem. Theory Comput., 2016, 12, 5920–5930 Search PubMed .
  18. M. Kim, W. J. Kim, T. Gould, E. K. Lee, S. Lebègue and H. Kim, J. Am. Chem. Soc., 2020, 142, 2346–2354 Search PubMed .
  19. M. Dion, H. Rydberg, E. Schröder, D. C. Langreth and B. I. Lundqvist, Phys. Rev. Lett., 2004, 92, 246401 Search PubMed .
  20. K. Lee, É. D. Murray, L. Kong, B. I. Lundqvist and D. C. Langreth, Phys. Rev. B: Condens. Matter Mater. Phys., 2010, 82, 081101 Search PubMed .
  21. R. Sabatini, T. Gorni and S. De Gironcoli, Phys. Rev. B: Condens. Matter Mater. Phys., 2013, 87, 041108 Search PubMed .
  22. I. Hamada, Phys. Rev. B: Condens. Matter Mater. Phys., 2014, 89, 121103 Search PubMed .
  23. K. Berland and P. Hyldgaard, Phys. Rev. B: Condens. Matter Mater. Phys., 2014, 89, 035412 Search PubMed .
  24. H. Peng, Z.-H. Yang, J. P. Perdew and J. Sun, Phys. Rev. X, 2016, 6, 041005 Search PubMed .
  25. M. Kim, W. J. Kim, E. K. Lee, S. Lebegue and H. Kim, Int. J. Quantum Chem., 2016, 116, 598–607 Search PubMed .
  26. F. Tran, L. Kalantari, B. Traoré, X. Rocquefelte and P. Blaha, Phys. Rev. Mater., 2019, 3, 063602 Search PubMed .
  27. H. Kruse, A. Mladek, K. Gkionis, A. Hansen, S. Grimme and J. Šponer, J. Chem. Theory Comput., 2015, 11, 4972–4991 Search PubMed .
  28. L. Goerigk, A. Hansen, C. Bauer, S. Ehrlich, A. Najibi and S. Grimme, Phys. Chem. Chem. Phys., 2017, 19, 32184–32215 Search PubMed .
  29. H. Kruse, P. Banas and J. Šponer, J. Chem. Theory Comput., 2018, 15, 95–115 Search PubMed .
  30. J. Rezac, D. Bm, O. Gutten and L. Rulisek, J. Chem. Theory Comput., 2018, 14, 1254–1266 Search PubMed .
  31. J. Antony and S. Grimme, Phys. Chem. Chem. Phys., 2006, 8, 5287–5293 Search PubMed .
  32. J. Paier, M. Marsman and G. Kresse, J. Chem. Phys., 2007, 127, 024103 Search PubMed .
  33. D. Řeha, H. Valdes, J. Vondrášek, P. Hobza, A. Abu-Riziq, B. Crews and M. S. de Vries, Chem. – Eur. J., 2005, 11, 6803–6817 Search PubMed .
  34. L. Goerigk, A. Karton, J. M. Martin and L. Radom, Phys. Chem. Chem. Phys., 2013, 15, 7028–7031 Search PubMed .
  35. J. P. Perdew, K. Burke and M. Ernzerhof, Phys. Rev. Lett., 1996, 77, 3865 Search PubMed .
  36. C. Lee, W. Yang and R. G. Parr, Phys. Rev. B: Condens. Matter Mater. Phys., 1988, 37, 785–789 Search PubMed .
  37. A. D. Becke, J. Chem. Phys., 1993, 98, 5648 Search PubMed .
  38. P. J. Stephens, F. Devlin, C. Chabalowski and M. J. Frisch, J. Phys. Chem., 1994, 98, 11623–11627 Search PubMed .
  39. J.-D. Chai and M. Head-Gordon, Phys. Chem. Chem. Phys., 2008, 10, 6615–6620 Search PubMed .
  40. Y. Zhao and D. G. Truhlar, J. Chem. Phys., 2006, 125, 194101 Search PubMed .
  41. Y. Zhao and D. G. Truhlar, Theor. Chem. Acc., 2008, 120, 215–241 Search PubMed .
  42. R. Peverati and D. G. Truhlar, J. Phys. Chem. Lett., 2011, 2, 2810–2817 Search PubMed .
  43. N. Mardirossian and M. Head-Gordon, J. Chem. Phys., 2016, 144, 214110 Search PubMed .
  44. J. F. Dobson, Int. J. Quantum Chem., 2014, 114, 1157–1161 Search PubMed .
  45. A. Donchev, J. Chem. Phys., 2006, 125, 074713 Search PubMed .
  46. J. P. Perdew and W. Yue, Phys. Rev. B: Condens. Matter Mater. Phys., 1986, 33, 8800 Search PubMed .
  47. E. D. Murray, K. Lee and D. C. Langreth, J. Chem. Theory Comput., 2009, 5, 2754–2762 Search PubMed .
  48. A. Becke, J. Chem. Phys., 1986, 85, 7184–7187 Search PubMed .
  49. J. Sun, A. Ruzsinszky and J. P. Perdew, Phys. Rev. Lett., 2015, 115, 036402 Search PubMed .
  50. G. Kresse and J. Hafner, Phys. Rev. B: Condens. Matter Mater. Phys., 1993, 47, 558 Search PubMed .
  51. G. Kresse and J. Furthmüller, Phys. Rev. B: Condens. Matter Mater. Phys., 1996, 54, 11169 Search PubMed .
  52. G. Kresse and J. Furthmüller, Comput. Mater. Sci., 1996, 6, 15–50 Search PubMed .
  53. T. Björkman, J. Chem. Phys., 2014, 141, 074708 Search PubMed .
  54. P. E. Blöchl, Phys. Rev. B: Condens. Matter Mater. Phys., 1994, 50, 17953 Search PubMed .
  55. N. Marom, A. Tkatchenko, M. Rossi, V. V. Gobre, O. Hod, M. Scheffler and L. Kronik, J. Chem. Theory Comput., 2011, 7, 3944–3951 Search PubMed .
  56. J. Jiang, Y. Wu, Z.-X. Wang and C. Wu, J. Chem. Theory Comput., 2010, 6, 1199–1209 Search PubMed .
  57. W. Qin, X. Li, W.-W. Bian, X.-J. Fan and J.-Y. Qi, Biomaterials, 2010, 31, 1007–1016 Search PubMed .
  58. Q. Wang, M.-h. Wang, K.-f. Wang, Y. Liu, H.-p. Zhang, X. Lu and X.-d. Zhang, Biomed. Mater., 2015, 10, 032001 Search PubMed .
  59. M. P. Andersson, Phys. Chem. Chem. Phys., 2016, 18, 19118–19122 Search PubMed .
  60. N. V. Ilawe, J. A. Zimmerman and B. M. Wong, J. Chem. Theory Comput., 2015, 11, 5426–5435 Search PubMed .
  61. K. Lee, Y. Morikawa and D. C. Langreth, Phys. Rev. B: Condens. Matter Mater. Phys., 2010, 82, 155461 Search PubMed .
  62. C. N. M. Ouma, P. Modisha and D. Bessarabov, RSC Adv., 2018, 8, 31895–31904 Search PubMed .
  63. S. Chmiela, H. E. Sauceda, K.-R. Müller and A. Tkatchenko, Nat. Commun., 2018, 9, 1–10 Search PubMed .

Footnote

Electronic supplementary information (ESI) available: The definition of conformation energy, the graphs and tables for the error statistics of the tested methods and the structures of individual conformers in the benchmark set. See DOI: 10.1039/d0cp04137h

This journal is © the Owner Societies 2020