Adam
Pecina‡
a,
René
Meier‡
b,
Jindřich
Fanfrlík
a,
Martin
Lepšík
a,
Jan
Řezáč
a,
Pavel
Hobza
*ac and
Carsten
Baldauf
*d
aInstitute of Organic Chemistry and Biochemistry (IOCB) and Gilead Sciences and IOCB Research Center, Flemingovo nám. 2, 16610 Prague 6, Czech Republic. E-mail: hobza@uochb.cas.cz
bInstitut für Biochemie, Fakultät für Biowissenschaften, Pharmazie und Psychologie, Universität Leipzig, Brüderstrasse 34, D-04109 Leipzig, Germany
cRegional Centre of Advanced Technologies and Materials, Department of Physical Chemistry, Palacký University, 77146 Olomouc, Czech Republic
dFritz-Haber-Institut der Max-Planck-Gesellschaft, Faradayweg 4-6, D-14195 Berlin, Germany. E-mail: baldauf@fhi-berlin.mpg.de
First published on 20th January 2016
Current virtual screening tools are fast, but reliable scoring is elusive. Here, we present the ‘SQM/COSMO filter’, a novel scoring function featuring a quantitative semiempirical quantum mechanical (SQM) description of all types of noncovalent interactions coupled with implicit COSMO solvation. We show unequivocally that it outperforms eight widely used scoring functions. The accuracy and chemical generality of the SQM/COSMO filter make it a perfect tool for late stages of virtual screening.
Score = ΔEint + ΔΔGsolv + ΔG′wconf − TΔS | (1) |
Our approach is systematic. Using accurate calculations in small model systems as a benchmark, we developed corrections for SQM methods that provide reliable and accurate description of a wide range of noncovalent interactions including dispersion, hydrogen- and halogen-bonding.16 Coupled with the PM6 SQM method,18 the resulting PM6-D3H4X approach is applicable to a wide chemical space and does not require any system-specific parameterisation. We use it here to calculate the ΔEint term. Subsequently, we compared MM-based (PB or GB) and QM-based (COSMO19 or SMD) implicit solvent models and found the latter group to be more accurate.20 These are therefore used for the ΔΔGsolv term. These two dominant terms, ΔEint and ΔΔGsolv, are at the heart of our SQM-based SF.15 We have demonstrated its generality in various noncovalent P–L complexes, such as aldose reductase or carbonic anhydrase and moreover extended it to treat covalent inhibitor binding (ref. 15, 21 and 22).
In this work, we adapt our SQM-based SF to make it usable in virtual screening on the basis of our previous experience. By taking the two dominant terms only, ΔEint and ΔΔGsolv, we define the ‘SQM/COSMO filter’ energy. Its performance is tested here against eight widely used SFs. GlideScore XP (GlideXP),23 PLANTS PLP (PLP),24 AutoDock Vina (Vina),25 Chemscore (CS),26 Goldscore (GS)27 and ChemPLP24 are empirical, regression-based functions which use different terms to describe vdW contacts, lipophilic surface coverage, hydrogen bonding, ligand strain, and desolvation. The Astex Statistical Potential (ASP)28 is a knowledge-based potential. The classical physics-based AMBER/GB SF combines the ff03-GAFF MM force fields with the GB implicit solvent.5,29
The goal is ‘cognate docking’,30i.e. the ability to identify sharply the known native X-ray P–L binding pose from a set of decoy structures generated by docking (Fig. 1). To understand our results in detail, we have not opted for treating them in a statistical manner31 as in the pose decoy test sets available.32 Instead we cautiously selected four unrelated difficult-to-handle P–L systems, which comply with strict criteria for the selection of crystallographic structures for docking (details in the ESI†).33 These systems are acetylcholine esterase (AChE, PDB: 1E66),34 TNF-α converting enzyme (TACE, PDB: 3B92),35 aldose reductase (AR, PDB: 2IKJ)36 and HIV-1 protease (HIV PR; PDB: 1NH0).37 For the latter, the protonation of the active site is inferred from ultra-high resolution X-ray crystallography. Based on these P–L crystal structures, we have created a set of non-redundant poses (2865 in total) by docking with four popular docking programs (Glide, PLANTS, AutoDock Vina and GOLD) coupled to seven widely used SFs23–28 (Fig. 1, Table S2, ESI†).
Fig. 1 The ligand poses generated by the four docking programs. Ligand poses are color-coded by RMSD. |
All the poses were re-scored by all nine SFs. For the seven regression- and knowledge-based SFs, we used the recommended protocols. For the two physics-based SFs, only hydrogen atoms and close contacts were relaxed by the AMBER/GB method. RMSD values of the poses relative to the crystal were measured (details in S1.6, ESI†). The scores were normalised and are shown relative to the score of the crystal pose.
The identification of the X-ray pose as the minimum-free-energy structure is an unambiguous criterion for the performance of any SF. The ideal behaviour of such a score vs. RMSD curve (Fig. 2) is characterised by the positive values of energies for the decoy poses. Small deviations (negative energies for very small RMSD values) are acceptable and might be explained by inaccuracies of the crystal structure. These conditions are met by the SQM/COSMO filter, unlike the other SFs (Fig. 2). The numbers of false-positive solutions as well as the maximum RMSD (RMSDmax) from the X-ray pose within a defined interval of the normalised score quantify the virtually ideal behaviour of the SQM/COSMO filter in comparison to the other SFs.
The number of false-positives is lowest for the SQM/COSMO filter, even zero for three P–L systems (Table 1). CS and ASP perform slightly worse. AMBER/GB performs satisfyingly well for three systems but yields 171 false-positives for TACE. For AChE, all the SFs perform satisfyingly well. For AR and HIV PR, GlideXP generates the highest number of false-positive solutions and also shape-wise the free energy landscape looks ill-defined (Fig. 2). In the case of AR, a plateau of negative relative scores is observed for GlideXP. The hardest case is the TACE metalloprotein. Here, all the SFs produce false-positive solutions but to a different extent. The SQM/COSMO filter performs best, followed by CS. This example in particular shows the strength of an electronic-structure theory description of P–L binding. The presence of the metal cation in this protein and the associated charge-transfer effects between the ligand and the cation are not adequately described by classical force-fields or statistical potentials, but they are well represented by the SQM/COSMO filter.
Scoring function | |||||||||
---|---|---|---|---|---|---|---|---|---|
SQM/COSMO | AMBER/GB | Glide | PLANTS | AutoDock | Gold | ||||
XP | PLP | Vina | ASP | CS | GS | ChemPLP | |||
AChE | 0 | 0 | 4 | 12 | 0 | 2 | 3 | 0 | 0 |
AR | 0 | 1 | 67 | 0 | 10 | 1 | 0 | 1 | 0 |
TACE | 39 | 171 | 181 | 294 | 63 | 56 | 49 | 78 | 111 |
HIV PR | 0 | 0 | 98 | 0 | 7 | 0 | 2 | 1 | 8 |
Total | 39 | 172 | 350 | 306 | 80 | 59 | 54 | 80 | 119 |
The second criterion, RMSDmax, is shown for the interval of the normalised relative scores below 5 (Table 2). The SQM/COSMO filter shows the lowest RMSDmax of 0.88 Å on average. CS follows with 1.28 Å on average. ASP and AMBER/GB satisfy the conditions of an averaged RMSDmax up to 2 Å. AMBER/GB, however, fails in the difficult case of TACE with RMSDmax of 4.76 Å. Analogous analyses at greater intervals have revealed a similar ordering of the SFs (Table S4, ESI†).
Scoring function | |||||||||
---|---|---|---|---|---|---|---|---|---|
SQM/COSMO | AMBER/GB | Glide | PLANTS | AutoDock | Gold | ||||
XP | PLP | Vina | ASP | CS | GS | ChemPLP | |||
Maximal RMSD within a window of 5 of the normalised score | |||||||||
AchE | 0.47 | 0.57 | 2.13 | 0.78 | 0.78 | 1.78 | 1.43 | 1.14 | 0.78 |
AR | 0.19 | 0.19 | 7.54 | 1.14 | 3.54 | 2.32 | 1.15 | 2.21 | 1.49 |
TACE | 1.91 | 4.76 | 3.02 | 2.91 | 7.13 | 2.01 | 1.54 | 2.44 | 2.40 |
HIV PR | 0.94 | 0.94 | 17.26 | 12.60 | 11.62 | 1.00 | 1.01 | 12.60 | 11.62 |
Average | 0.88 | 1.62 | 7.49 | 4.61 | 5.77 | 1.78 | 1.28 | 4.60 | 4.55 |
The SQM/COSMO filter enables us not only to recognise the correct binding pose (RMSD below 2 Å) but also to go beyond this limit and evaluate even small changes in the geometry of the ligand binding.
The price for such a high accuracy is the increased computational time requirements. The SQM/COSMO filter is ca. 100-times slower than the statistics- and knowledge-based SFs and about 10-times slower than the classical physics-based AMBER/GB. However, compared to the standard SQM-based SF, it is ca. 100-times faster. The speed can be further enhanced by parallelisation.
To summarise, we have pushed the limits of the accuracy of SFs to judge the energetics of P–L noncovalent interactions. Based on our development and the extensive experience with SQM-based scoring functions,3,21 the SQM/COSMO filter has been introduced. It features two dominant terms to describe P–L interaction, namely the ΔEint term at the PM6-D3H4X level for gas-phase noncovalent interactions and the ΔΔGsolv term at the COSMO level for implicit solvation. We showed previously that both these methods are very accurate at a reasonable speed.16,20 The SQM/COSMO energy is calculated in four unrelated P–L complexes. The SQM/COSMO filter is compared to eight widely used SFs, which are statistics-, knowledge- or force-field-based. The SQM/COSMO scheme exhibits a superior performance as judged by two criteria, the number of false positives and RMSDmax. In contrast to standard SFs, no fitting against data sets has been involved. Furthermore, it offers generality and comparability across the chemical space and no system-specific parameterisations have to be performed. The time requirements allow for calculations of thousands of docking poses as we have demonstrated in this pilot study. We propose the SQM/COSMO filter as a tool for accurate medium-throughput refinement in later stages of virtual screening or as a reference method for judging the performance of other scoring functions. The proof of concept that reliable QM calculations can now be performed for tens of thousands of large biochemical entities opens a way to progress in closely related disciplines such as materials design.
This work was supported by research projects RVO 61388963 awarded to the Institute of Organic Chemistry and Biochemistry, Academy of Sciences of the Czech Republic. We acknowledge the financial support of the Czech Science Foundation (grant number P208/12/G016). The authors acknowledge the support by the project L01305 of the Ministry of Education, Youth and Sports of the Czech Republic. The computations were performed at the Center for Information Services and High Performance Computing (ZIH) at TU Dresden.
Footnotes |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/c5cc09499b |
‡ These authors have contributed equally to this work. |
This journal is © The Royal Society of Chemistry 2016 |