Katarzyna
Rzęsikowska
a,
Justyna
Kalinowska-Tłuścik
*a and
Anna
Krawczuk
*b
aDepartment of Crystal Chemistry and Crystal Physics, Faculty of Chemistry, Jagiellonian University, Gronostajowa 2, 30-387 Kraków, Poland. E-mail: justyna.kalinowska-tluscik@uj.edu.pl
bInstitute of Inorganic Chemistry, Georg-August University, Tammannstrasse 4, 37077, Goettingen, Germany. E-mail: anna.krawczuk@uni-goettingen.de
First published on 16th December 2022
Computational methods, especially molecular docking-based calculations, have become indispensable in the modern drug discovery workflow. The constantly increasing chemical space requires fast, robust but most of all highly predictive methods to search for new bioactive agents. Thus, the scoring function (SF) is a useful and broadly applied energy-based element of docking software, allowing quick and effective evaluation of a ligand's propensity to bind to selected protein targets. Despite many spectacular successes of molecular docking applications in virtual screening (VS), the obtained results are often far from ideal, leading to incorrect selection of hit molecules and poor pose prediction. In our study we focused on docking calculation for the selected class A G-protein coupled receptors (GPCRs), with experimentally determined 3D structures and a sufficient set of known ligands with affinity values reported in the ChEMBL database. Our goal is to investigate how much the energy-based scoring function for this particular target class changes when changing from the default to the re-estimated weighting scheme on the specified energy terms in the SF definition. Additionally, we want to verify if indeed more accurate results are obtained when considering different levels of the biological hierarchy, namely: the whole class A GPCRs, sub-subfamilies, or just the individual proteins while applying default or specifically designed weighting coefficients. The performed calculation and evaluation factor values suggest a significant improvement of docking results for the designed SF definition. This individual approach improves the accuracy of binding affinity prediction and active compound recognition. The designed scoring function for classes, sub-subfamilies, or proteins leads to a significant improvement of molecular docking performance, especially at the level of individual proteins. Our results show that to increase the efficiency and predictive power of molecular docking calculations applied in classical VS, the strategy based on the individual approach for scoring function definition for selected proteins should be considered.
There are three main classes of SFs which are routinely applied in the available docking software: force field-based, knowledge-based and empirical.2 The first class includes those implemented in AutoDock,3 Dock,4 or Gold (GoldScore)5 and estimates binding free energy by calculating various energy terms, for example, electrostatic or van der Waals. The “knowledge-based” class stands on statistical analyses of protein–ligand crystal structure complexes and is used in Gold (ASP). The last type, “empirical”, relies on binding energies calculated as a weighted sum of every hydrophobic contact and hydrogen bond. The Piecewise Linear Potential (PLP)6 and ChemScore7 functions along with the scoring function found in Glide can be included in this class. The linear combination of the two or more SF components from the same or different classes (so-called hybrid scoring function) is a recognized strategy,2,8 and demonstrates the better performance of docking calculations with comparison to single SF use.9–11 Recently, with the highly developing artificial intelligence application in drug discovery, the machine-learning-based class of SF was additionally introduced.12 Despite the progress in scoring function methodologies, the accurate prediction of protein–ligand binding affinities remains a challenge. Benchmark studies have shown that the calculated binding scores display a poor correlation with experimental affinity.13–15 This can result in a high percentage of false-positive compounds in the hit list13,15 and therefore, increase the costs of the drug design process. The above mentioned studies14,16 revealed another major outcome: the tested scoring functions are not universal enough, thus the performance of the entire benchmark may not be consistent with the performance of individual targets. In other words, when the given type of protein has only a few representatives in the investigated group, there is a possibility that the obtained results for the whole set do not match the outcome acquired for these several proteins.
In the present article, we focus on class A of the G protein-coupled receptor (GPCR) family, as a selected protein family with a well-established druggability profile. GPCRs belong to one of the largest transmembrane receptor families that activate internal signal transduction pathways. They are activated by various agents, for example, ions, neurotransmitters, odorants, hormones, lipids, peptides, or proteins.17 Due to their abundance in the human body and involvement in the etiology of many diseases, they are the target of ca. 30% of all marketed drugs.18 However, in most benchmark studies, they are not the primary target of research, and even if they are considered, only a few representatives are included in the study. While such sets can be useful to test the overall performance of the molecular docking program, they may fail in the case of more specific proteins. The GPCRs are interesting targets in terms of molecular docking studies due to the deep binding pocket which almost completely encloses the small molecule ligands. Moreover, the percentage of sequence identity for the binding pocket region for some receptors may reach up to 88% (e.g. M2 and M4 receptors based on the similarity matrix generated using GPCRdb online server19). For the mentioned reasons, the results obtained from benchmark studies where GPCRs are not the primary target, may not apply for those proteins. Although several docking experiments were a successful strategy in the drug design,20 the molecular docking results may be often misleading and questionable, despite a high score.21,22 The selected AutoDock software is not the top performing tool. However, exhaustive research work on pose prediction for the variety of selected protein–ligand complexes available in the PDB, screened additionally against bioactivity data, shows that good pose prediction does not always correlate with good scoring, irrespective of the software used.23 Moreover, the comparative studies indicate that none of the docking tools truly outperforms the others and none of the scoring functions is universal to correctly predict and/or evaluate the ligand's pose for all types of molecules and protein families.23,24 In view of observations related to an irregularity in the performance of scoring functions across various kinds of protein targets, we aim to answer the question of whether the scoring function in AutoDock 4.2 should be applied in the default mode or shall it be re-defined for a particular family/class/protein hierarchy level, on the example of class A GPCRs.
In recent years, more than 50 different docking software packages have been developed. Although they differ regarding scoring function types and conformational space search algorithms, the limitations described in the previous section have not been overcome to date. Over the last 20 years, more than 30% of published articles related to docking studies have used AutoDock as the docking software.25 This free software (its version AutoDock4, available under the GNU General Public License) is widely used with approximately 10000 citations since its release in 2009. AutoDock4 was successfully applied in the discovery of several potent inhibitors binding to peptides, proteins, or genes.26–28 The used semi-empirical scoring function is based on the AMBER force field and consists of five weighted energy terms: dispersion/repulsion (a Lenard-Jones 6/12 potential), hydrogen bond (directional hydrogen bond components; based on 10/12 potential), electrostatic (Coulombic potential), desolvation (based on the volume of surrounding atoms, which shelter the given atom from solvent) and entropic term related to the number of freely rotatable bonds in the ligand.29 In view of observations related to an irregularity in the performance of scoring functions across various kinds of protein targets, we aim to answer the question of whether the scoring functions applied in AutoDock 4.2 should be considered universal in the default mode or should it be re-defined for a particular family/class/protein hierarchy level.
• B1 with Ki ≤ 1 nM.
• B2 with Ki in range (1; 10 > nM.
• B3 with Ki in range (10; 100 > nM.
• B123 with Ki ≤ 100 nM.
• B45 with Ki > 100 nM.
The molecules for which the reported Ki values varied significantly and did not allow a straight classification to an inactive basket (B45) or one of the active baskets (B1, B2, or B3) were rejected. Ligands in the basket-divided set were converted to the PDB format with Open Babel software31 and prepared for the docking procedure, using the prepare_ligand4.py script (AutoDock tools3). Three sets (training, testing1, and testing2) were created using the aforementioned processed database. The training set was utilized to generate new weights for SF in AutoDock software and testing1 and testing2 sets to verify the designed weighting accuracy. As the quantity of ligands varies for individual receptors, to maximize the number of examined receptors, the B1, B2, and B3 baskets contained four ligands and the B45 baskets twenty-four molecules. The B123 basket was established by merging all three active ones (B1, B2, and B3). For each set (training, testing1, and testing2) molecules were selected randomly and without repetition to ensure the diversity of ligands. The ChEMBL codes of selected ligands and their distribution within baskets and the three mentioned sets are shown in Table S1 in the ESI† file.
Ki = e(ΔG/RT) | (1) |
This approach was applied in our study to convert the calculated energy (obtained based on the designed new weights) into Ki, and subsequently, assign the top-scored poses of each ligand into the corresponding basket. As the ligands in our study were divided and characterized by the Ki range baskets, and not by the exact and individual affinity values, the obtained new SF (with the new weighting scheme applied) as well as the original SF (with the default weights) could not be evaluated applying the commonly used ‘Scoring power’ method.13 For that reason, the variation of the enrichment factor35 (EF) was used, in which instead of the fraction of the database, the particular basket was tested:
(2) |
Fig. 1 The biological hierarchy of GPCRs selected for this study. The family level is highlighted in yellow, sub-subfamily level in green, and protein in blue. |
In total 24 different proteins were examined which belonged to thirteen various sub-subfamilies of the class A GPCRs. Only sub-subfamilies containing at least three experimentally determined structural representatives were taken under consideration. Thus, from the set shown in Fig. 1, seventeen groups were analyzed: whole (family level), serotonin, opioid, dopamine, and muscarinic acetylcholine receptors (sub-subfamily level) and 5-HT1B, 5-HT2B, 5-HT2C, D2, D3, D4, M1, M2, M4, δ, κ and NOP receptors (protein level). For each ligand–protein complex, five energy components were extracted from docking results, using a script from the AutoDock tools. Data for sub-subfamily and family levels were obtained by combining the outcomes for relevant individual proteins.
Examined receptor group | EF for default weights | EF for new weights | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
B1 | B2 | B3 | B123 | B45 | B1 | B2 | B3 | B123 | B45 | |
WHOLE | 1.11 | 1.01 | 1.07 | 1.10 | 1.09 | 2.92 | 1.69 | 2.45 | 1.14 | 1.12 |
Serotonin | 0.00 | 1.29 | 1.17 | 0.98 | 0.99 | 3.60 | 2.70 | 3.00 | 1.33 | 1.11 |
5-HT1B | 0.00 | 1.42 | 1.50 | 1.11 | 1.17 | 3.60 | 3.00 | 3.00 | 1.83 | 1.42 |
5-HT2B | 0.00 | 0.00 | 1.29 | 1.00 | 1.00 | 4.50 | 3.00 | 3.00 | 1.64 | 1.14 |
5-HT2C | 0.00 | 1.50 | 0.90 | 0.75 | 0.90 | 3.00 | 9.00 | 3.00 | 1.50 | 1.16 |
Dopamine | 0.00 | 0.96 | 0.64 | 1.07 | 1.07 | 2.70 | 2.57 | 2.00 | 1.36 | 1.13 |
D2 | 0.00 | 1.29 | 0.00 | 1.00 | 1.00 | 3.00 | 3.00 | 4.50 | 1.65 | 1.41 |
D3 | 0.00 | 1.50 | 0.56 | 1.09 | 1.07 | 4.50 | 4.50 | 4.50 | 1.93 | 1.30 |
D4 | 0.00 | 0.00 | 1.13 | 1.13 | 1.13 | 6.00 | 3.00 | 4.50 | 1.71 | 1.23 |
Muscarinic | 0.00 | 1.50 | 1.66 | 1.26 | 1.15 | 1.29 | 1.42 | 1.80 | 1.38 | 1.45 |
M1 | 0.00 | 1.29 | 0.90 | 1.06 | 1.03 | 2.25 | 6.00 | 2.25 | 1.38 | 1.50 |
M2 | 0.00 | 2.57 | 1.89 | 1.15 | 1.20 | 1.50 | 4.50 | 3.86 | 1.44 | 1.50 |
M4 | 0.00 | 0.00 | 2.00 | 1.71 | 1.23 | 3.60 | 4.50 | 9.00 | 1.91 | 1.20 |
Opioid | 0.00 | 1.50 | 1.17 | 1.07 | 1.06 | 1.64 | 4.50 | 2.57 | 1.36 | 1.11 |
Delta | 0.00 | 3.60 | 1.20 | 1.00 | 1.00 | 3.60 | 4.50 | 4.50 | 1.91 | 1.20 |
Kappa | 0.00 | 0.82 | 0.82 | 1.00 | 1.00 | 1.80 | 4.50 | 3.00 | 1.41 | 1.18 |
NOP | 0.00 | 0.00 | 1.35 | 1.23 | 1.18 | 2.70 | 4.50 | 9.00 | 1.50 | 1.13 |
The initial purpose of this study was to elucidate if the improvement of the molecular docking performance requires that the scoring function is individually designed for each class, family, or target, separately. From EF values for the original weights, no definitive conclusions could be drawn. However, data analysis of results for baskets B1, B2, and B3 obtained for the training set with new weights indicated several interesting conclusions (Fig. 2). First of all, the average ratio EFnew/EForiginal is 3. The most significant difference can be identified for receptor D3R, basket B3, where the new EF is eight times higher than the corresponding one obtained for the original SF. The dopamine receptors (D2, D3, and D4) show one of the highest EF ratios, i.e. for D3R it is 4.50, 3.00, and 8.00 for B1, B2, and B3 baskets, respectively. The remaining receptors in the dopamine sub-subfamily also have, in most cases, evaluated rates greater than or equal to the average value. What is more, the new EF values for D2, D3, and D4 receptors are higher than the ones for the dopamine sub-subfamily (Fig. 3). This trend is also noticeable for other sub-subfamilies, although it is not as evident as in the above-mentioned example. For almost all proteins, EF values for the individual receptors are greater than or equal to EF for corresponding sub-subfamilies. Only in the case of the serotonin receptor (5-HT2CR) for the most active ligands in the B1 basket, the EF score is lower than for sub-subfamily. The observed tendency is not observed when shifting from class A to sub-subfamily. For example, the muscarinic acetylcholine receptor sub-subfamily has lower values in all three baskets (1.29, 1.42, and 1.80) compared to class A (2.92, 1.69, and 2.45 for B1, B2, and B3, respectively), whereas the serotonin (3.60, 2.70 and 3.00) receptor sub-subfamilies perform better than class A for each basket.
A similar analysis can be carried out considering the differentiation between active (B123) and inactive (B45) molecules. Only in two cases, the EF values for new weights appeared to be lower than the corresponding ones for the default weights (for M4 and NOP receptors in the inactive basket). Receiving inferior values is not an optimal result. However, most scientists use docking to search for active, not inactive compounds, and therefore, the obtained result is an acceptable outcome. The values obtained for B123 reveal the significant influence of the uniquely designed scoring function. While for original weights, the EF values range from 0.75 to 1.71 with average values equal to 1.10, for new SF the minimum value is higher than the average for original SF (1.14 for the whole GPCR family). The transition from family to sub-subfamily as well as from sub-subfamily to selected receptor significantly improves the VS calculations. For example, for the muscarinic acetylcholine receptors sub-subfamily the EF is equal to 1.38 and for the receptors belonging to this sub-subfamily is 1.38, 1.44, and 1.91 for M1, M2, and M4 receptors, respectively. Apart from an increase in the EF scores, the application of the new weights can also influence the values of %A and %SR. For all the groups in the B1 basket, the new %A is at least 7%, whereas, for the original weights, values not equal to 0 are obtained only for the whole class A. The average %A in B1 is equal to 54.17% and the average scores for the other baskets remain at the same satisfactory level. At the same time, the average values of %SR significantly increase from 0.00%, 12.95%, 12.53%, 36.75% and 71.64% to 37.08%, 50.00%, 50.10%, 55.05% and 85.30% for B1, B2, B3, B123 and B45 baskets, respectively. In a few cases, the %SR is even equal to 100% (e.g. B3 basket for M4 receptor). Considering initial restrictions on ligands abundance in the resulting set, the obtained values are very promising towards new potential hit identification.
The average EF for all active baskets (B1, B2, and B3) are 2.64 and 3.03 for testing1 and testing2, respectively. Like in the training set, the improvement related to designing individual weights for the class, sub-subfamily or receptor is explicitly visible for the basket B123 (Fig. 4). In some cases, the new EF values are even around 2.5 times higher for receptors than for adequate sub-subfamily (e.g. opioid sub-subfamily in the testing1 set or dopamine sub-subfamily in the testing2 set, see Tables S4 and S5 in the ESI† file). Overall, the results obtained for testing sets confirm that individually designed SF weights performed at the same level for all three examined sets.
The investigation presented here is highly relevant to the field of computer-aided drug discovery. It is because there is no meticulous research to determine the performance of the scoring function on different kinds of protein targets, whereas most virtual screening calculations are carried out for one or a few specific targets. In this study, we focused on twelve proteins of class A GPCRs which belong to four different sub-subfamilies. To ensure the diversity of the ensemble, another 12 receptors from the same class but different sub-subfamilies were considered. Newly designed scoring function coefficients were calculated for each selected target as well as sub-subfamily and whole class A. The comparison of the calculated evaluation factors: EF, %A, and %SR for all obtained data, showed a significant improvement in the docking results comparing to the default SF settings. In particular, the accuracy of binding affinity prediction tends to be increased when individually designed weighting coefficients are applied. Additionally, the active compounds recognition success rate increases by nearly a factor of 2. The most important finding is that the design of a specific scoring function for class, sub-subfamily, or protein (three different levels of biological hierarchy) leads to a significant improvement in molecular docking performance.
Notwithstanding the promising results of this manuscript, we also see the need to further improve and expand our research. In the future studies, different activation states of the receptors should be still taken into consideration and carefully evaluated as it may significantly influence the shape and size but also electrostatic potential of the active binding site. Additionally, prediction of ligands binding to allosteric sites of the GPCRs remains a challenge but could be an interesting direction of our study. Both above-mentioned effects could affect the definition of the scoring schemes. Despite these aspects, the presented results clearly show that the individual approach for scoring function calculation in virtual screening strategy can lead to a higher probability of hit molecule identification. Thus, considering the chosen receptor–ligand system, the estimation of weights for the scoring function would be beneficial before VS calculations. Even though this approach requires increased computational time, the calculations of new and individually designed weights for the force field scoring function can decrease the number of failures and in consequence, reduce the costs of identifying and examining the lead molecule.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d2cp04671g |
This journal is © the Owner Societies 2023 |