Open Access Article
De Kang
a,
Xiaocong Panga,
Wenwen Liana,
Lvjie Xua,
Jinhua Wanga,
Hao Jiaa,
Baoyue Zhanga,
Ai-Lin Liu*abc and
Guan-Hua Du*abc
aInstitute of Materia Medica, Chinese Academy of Medical Sciences, Peking Union Medical College, Xian Nong Tan Street, Beijing 100050, China. E-mail: liuailin@imm.ac.cn; dugh@imm.ac.cn; Fax: +86-10-6316-5184; Tel: +86-10-8315-0885
bBeijing Key Laboratory of Drug Target Research and Drug Screening, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing 100050, China
cState Key Laboratory of Bioactive Substance and Function of Natural Medicines, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing 100050, China
First published on 30th January 2018
The high morbidity and mortality of cancer make it one of the leading causes of global death, thus it is an urgent need to develop effective drugs for cancer therapy. Vascular endothelial growth factor receptor-2 (VEGFR2) acts as a central modulator of angiogenesis, and is therefore an important pharmaceutical target for developing anti-angiogenic agents. In this study, ligand-based naïve Bayesian (NB) models and structure-based molecular docking were combined to develop a virtual screening (VS) pipeline for identifying potential VEGFR2 inhibitors from FDA-approved drugs. The best validated naïve Bayesian model (NB-c) gave Matthews correlation coefficients of 0.966 and 0.951 for the test set and external validation set, respectively. 1841 FDA-approved drugs were sequentially screened by the optimal model NB-c and molecular docking module LibDock. By analyzing the results of VS, 9 top ranked drugs with EstPGood value ≥ 0.6 and LibDock Score ≥ 120 were chosen for biological validation. VEGFR2 kinase assay results demonstrated that flubendazole, rilpivirine and papaverine showed VEGFR2 inhibitory activities with IC50 values ranging from 0.47 to 6.29 μM. Binding mode analysis with CDOCKER revealed the action mechanism of the 3 hit drugs binding to VEGFR2. In summary, we not only proposed an integrated VS pipeline for potential VEGFR2 inhibitors screening, but also identified 3 FDA-approved drugs as novel VEGFR2 inhibitors, which could be used to design and develop new antiangiogenic agents.
688
780 new cancer cases and 600
920 cancer deaths will occur in the United States.1 Recently, great advances have been achieved in the development of therapeutic agents for chemotherapy, hormone therapy and immunotherapy of cancer. However, the cancer mortality remains high due to the bewildering genetic and phenotypic heterogeneity of cancer, as well as the development of resistant phenotypes.2 Thus, it is urgent to develop new anticancer drugs to prolong survival, minimize chemotherapy side effects and improve quality of life for cancer patients.
Abnormal angiogenesis, a process by which new blood vessels sprout from pre-existing vessels, is well recognized as one common characteristic of various cancer types.3,4 By increasing the number of capillaries into the expanding tumor tissues, tumor-associated neo-vasculature is induced to accelerate tumorigenesis through aiding in nutrient supply and metastasis of tumor cells.5 VEGF/VEGFR2 signaling has been widely accepted for its pro-angiogenic role by dominating all steps of angiogenesis including survival, proliferation, migration and capillary-like tube formation of endothelial cells.6 Therefore, inhibition of VEGFR2 activity emerges as a potential therapy strategy against tumor-induced angiogenesis.7 Nowadays, several VEGFR2 inhibitors including sorafenib, sunitinib, pazopanib, vatalanib, axitinib, regorafenib and lenvatinib have been approved by Food and Drug Administration (FDA) for clinical therapy of cancers.8,9 Though the great success in anti-angiogenic therapy of cancers, the clinical usage of VEGFR2 inhibitors is always accompanied with side effects such as diarrhea, hypertension and hand-foot syndrome.10 Besides, emergence of drug resistance leads to a reduction in effectiveness of existing anti-angiogenic agents, thus discovering novel VEGFR2 inhibitors with less side effect and drug resistance is still in great demand.11,12
The currently available experimental screening assays for VEGFR2 inhibitors are labor-extensive, time-consuming and expense-costly. Therefore, it becomes necessary to improve screening efficiency by applying virtual screening (VS) in the lead identification of novel VEGFR2 inhibitors.13 Molecular docking, pharmacophore modeling and machine learning approaches are the most used methodologies in the virtual screening. Machine learning approaches possess robust abilities in separating inhibitors from noninhibitors by building classification models adopting statistical algorithms including linear discriminant analysis (LDA),14,15 support vector machine (SVM),16,17 naïve Bayesian (NB),17–21 recursive partitioning (RP),18 random forest (RF),22 and artificial neural network (ANN).23 SVM and NB models have been successfully applied in virtual screening of inhibitors against butyrylcholinesterase and neuraminidase in our group.17,19 To date, there have been some practical applications of VS in predicting VEGFR2 inhibitors. Zhang et al. proposed a fragment VS concept based on Bayesian categorization, and discovered 20 potential VEGFR-2 scaffolds.24 Besides, a multistep virtual screening protocol comprising ligand-based support vector machines, drug-likeness rules filter and structure-based molecular docking was developed and employed by Chen et al. to identify dual inhibitors of VEGFR2 and Src from a commercial chemical library.16 In another integrated VS with a sequential filter of Lipinski rules, structure-based pharmacophore modeling and molecular docking, 10 VEGFR2 inhibitors with IC50 values ranging from 1 to 10 μM were obtained by screening a commercial library of 82
000 compounds.25 However, by far, the combined application of NB classifiers and LibDock, a site-directed docking program, in VS of potential VEGFR2 inhibitors has not yet been studied.
One short-path approach for developing new drugs is drug repurposing or repositioning, by which existing approved drugs are repurposed for new targets or indications. Considering the known pharmacokinetics and pharmacodynamics of existing drugs, drug repurposing can boost drug discovery by shortening the time of experimental validation and improving the possibility of success.26 One good example of drug repurposing is thalidomide, which was originally launched as a sedative in 1957. Though thalidomide was withdrawn from the market in 1961 because of its severe teratogenic effects, it was later approved by FDA for treatment of erythema nodosum leprosum (ENL) in 1998. And then in 2006, the FDA further approved thalidomide for treatment of newly diagnosed multiple myeloma together with dexamethasone.27
In this study, the detailed workflow for VS of VEGFR2 is shown in Fig. 1. The goal of this study is to construct effective prediction models for VEGFR2 inhibitors, find novel VEGFR2 inhibitors from the approved drugs, therefore provide important information for drug design and new drug development for tumor treatment.
![]() | ||
| Fig. 1 Workflow of the integrated virtual screening and drug screening approaches for VEGFR2 kinase inhibitors. | ||
392 decoys were then randomly distributed into a training set and a test set at a ratio of 3
:
1. Finally, the training set was composed of 2598 inhibitors and 7794 decoys, and the test set included 866 inhibitors and 2598 decoys. In addition, a total of 82 VEGFR2 inhibitors (Ki ≤ 1000 nM) were collected from literatures and combined with 246 decoys, which are not included in the training set or test set, to form an external validation set. The activities of the inhibitors and decoys in the entire date set were labeled as “1” and “−1” respectively.
Ahead of molecular descriptors calculation, all compounds were prepared by washing and energy minimizing in MOE to remove inorganic counterions, add hydrogen atoms, deprotonate strong acids, protonate strong bases, generate stereoisomers and valid single 3D conformers. The detailed information of the training set, test set and external validation set is described in Fig. S1–S3.†
Besides the molecular descriptors mentioned above, two types of fingerprints named SciTegic extended-connectivity fingerprints (ECFP and FCFP) and Daylight-style path-based fingerprints (EPFP and FPFP) were also calculated in DS 2016. In order to represent the structural fragments in a moderate size, the diameters of each fingerprints were set to be 4 and 6, which were commonly used in ref. 17 and 30. In total, 8 fingerprints named ECFP_4, ECFP_6, EPFP_4, EPFP_6, FCFP_4, FCFP_6, FPFP_4, FPFP-6 were calculated.
The predictive performance of the NB model was evaluated by an internal 5-fold cross validation, as well as its performance on the test set and external validation set. Four evaluation indexes, including sensitivity (SE), specificity (SP), accuracy (Q), and Matthews correlation coefficient (MCC), were calculated using eqn (1)–(4). True positive (TP) represents the number of inhibitors that are correctly classified as active compounds. True negative (TN) represents the number of decoys that are correctly classified as inactive compounds. False positive (FP) represents the number of decoys that are incorrectly classified as active compounds. False negative (FN) represents the number of inhibitors that are incorrectly classified as inactive compounds.
![]() | (1) |
![]() | (2) |
![]() | (3) |
![]() | (4) |
While SE and SP represent the prediction accuracy for inhibitors and decoys respectively, Q indicates the overall prediction accuracy for all compounds in the dataset. MCC, with values ranging from −1 to 1, is the most important indicator for the quality of binary classification. In this study, the MCC value is considered as the main evaluation index.
| Ratio = (665 nm/615 nm) × 104 | (5) |
| Specific signal = ratio of sample − ratio of negative | (6) |
| Inhibitory rate (%) = (1 − specific signal of sample/specific signal of positive) × 100% | (7) |
PCA was conducted on molecular descriptors including Molecular_Weight, ALogP, Num_H_Donors, Num_H_Acceptors, Num_RotatableBonds, Num_AromaticRings, Num_Rings, and Molecular_Fractional Polar Surface Area. As shown in Fig. 2, both the principal component 1 and 2 of the entire dataset are distributed in a wide range of space, indicating the chemical diversity of the training set, test set and external validation set.
![]() | ||
| Fig. 2 Chemical space analysis of the training set (A), test set (B), and external validation set (C) by principal component analysis (PCA). | ||
The Tanimoto similarity analysis was another approach to investigate the diversity of compounds within the entire dataset.34 The Tanimoto coefficients (Tc) of the entire dataset were calculated with the fingerprint of ECFP_4 and summarized in Table 1. The Tc values of the training set, test set and external validation set range from 0.115 to 0.136. Consequently, we conclude that the entire dataset possesses enough chemical diversity.
| Dataset | Inhibitors (active) | Decoys (inactive) | Total | Tanimoto coefficient (ECFP_4) |
|---|---|---|---|---|
| Training set | 2598 | 7764 | 10 392 |
0.115 |
| Test set | 886 | 2598 | 3464 | 0.117 |
| External validation set | 82 | 246 | 328 | 0.136 |
| Descriptor class | Number of descriptors | Descriptors |
|---|---|---|
| Discovery studio 2D descriptors (DS_2D) | 16 | Estate_AtomTypes, AlogP_AtomClassName, logD, Molecular_Solubility, HBA_Count, HBD_Count, Num_AromaticRings, Num_ChainAssemblies, Num_RingAssemblies, Num_StereoAtoms, Molecular_FractionalPolarSASA, Molecular_FractionalPolarSurfaceArea, Molecular_PolarSASA, CHI_V_3_C, Kappa_2, Wiener |
| Molecular fingerprints | 8 | ECFP_4, ECFP_6, EPFP_4, EPFP_6, FCFP_4, FCFP_6, FPFP_4, FPFP-6 |
The statistical results of 5-fold cross-validation and test set validation for the 9 NB models were listed in Table 3. The model NB-a built with 16 DS_2D descriptors showed a moderate performance with the accuracy (Q) of 0.868 and 0.833, the Matthews correlation coefficient (MCC) of 0.686 and 0.619 for the training set and test set, respectively.
| Model | Descriptors | Number of descriptors | Training set | Test set | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| SE | SP | Q | MCC | SE | SP | Q | MCC | |||
| NB-a | DS_2D_MP | 16 | 0.872 | 0.866 | 0.868 | 0.686 | 0.853 | 0.826 | 0.833 | 0.619 |
| NB-b | MP + ECFP_4 | 17 | 0.985 | 0.992 | 0.991 | 0.975 | 0.982 | 0.984 | 0.983 | 0.956 |
| NB-c | MP + ECFP_6 | 17 | 0.988 | 0.997 | 0.995 | 0.985 | 0.982 | 0.989 | 0.987 | 0.966 |
| NB-d | MP + EPFP_4 | 17 | 0.937 | 0.932 | 0.933 | 0.833 | 0.937 | 0.904 | 0.912 | 0.789 |
| NB-e | MP + EPFP_6 | 17 | 0.962 | 0.956 | 0.957 | 0.891 | 0.960 | 0.918 | 0.928 | 0.828 |
| NB-f | MP + FCFP_4 | 17 | 0.979 | 0.978 | 0.978 | 0.944 | 0.961 | 0.962 | 0.961 | 0.901 |
| NB-g | MP + FCFP_6 | 17 | 0.987 | 0.990 | 0.990 | 0.973 | 0.971 | 0.987 | 0.983 | 0.955 |
| NB-h | MP + FPFP-4 | 17 | 0.952 | 0.948 | 0.949 | 0.870 | 0.938 | 0.920 | 0.924 | 0.815 |
| NB-i | MP + FPFP-6 | 17 | 0.955 | 0.973 | 0.968 | 0.917 | 0.958 | 0.931 | 0.938 | 0.847 |
The moderate performance of NB-a could be explained by the failure of DS_2D descriptors to characterize the important substructures and molecular fragments which are critical for VEGFR2 inhibition. Hence, DS_2D descriptors, combined with molecular fingerprints, were used simultaneously to establish NB models. As a result, the addition of fingerprints can improve the performance of the NB models, as the Q and MCC values of the NB models (NB-b ∼ NB-i) based on both DS_2D descriptors and molecular fingerprints are higher than those of NB-a, which is based on DS_2D descriptors only. Considering the length of the fingerprints, the NB models based on fingerprints of diameter 6 perform better than those based on fingerprints of diameter 4. For example, while NB-b (MP + ECFP_4) gives MCC values of 0.975 and 0.956 for the training set and test set, NB-c (MP + ECFP_6) outperforms NB-b with MCC values of 0.985 and 0.966 for the training set and test set, respectively. The same situation applies to other fingerprints, as shown by NB-d and NB-e, NB-f and NB-g, NB-h and NB-i. Additionally, among the fingerprints of diameter 6, ECFP_6 surpasses other fingerprints in establishing NB models, since NB-c (MP + ECFP_6) has the highest sensitivity (SE) value of 0.982, specificity (SP) value of 0.989, Q value of 0.987, and MCC value of 0.966 for the test set.
Since the compounds from the training set and test set share great similarity in chemical diversity, NB models developed by the training set may easily acquire a good value of Q and MCC for the test set. Therefore, a good performance on test set cannot necessarily guarantee a good predictive ability for compounds whose chemical structures differ greatly from those of the training set. To this end, 82 VEGFR2 inhibitors with Ki ≤ 1000 nM were extracted from literatures.35–40 Molecular scaffold analysis of the 82 VEGFR2 inhibitors resulted in six molecular scaffolds as summarized in Fig. S1.† Then six ECFP_4 based similarity search queries were used to find similar molecules for VEGFR2 inhibitors sharing the same scaffold from the 2598 inhibitors in the training set. While 45 similar molecules were retrieved for the inhibitors sharing scaffold 3, only 4 and 5 similar molecules were retrieved for the inhibitors sharing scaffold 1 and 6, respectively. It is worth noting that none similar molecules were retrieved for inhibitors sharing scaffold 2, 4 and 5. Thus, we conclude that the chemical structures of the 82 VEGFR2 inhibitors differ from those of the training set. The 82 VEGFR2 inhibitors was then combined with 246 decoys which are not included in the training and test sets to make up the external validation set.
The external validation set was used to further validate the predictive abilities of the 4 NB models (NB-b, NB-c, NB-f and NB-g) with MCC values higher than 0.9 for both training set and test set. The 4 NB models exhibit good performances for the external test set, with Q values ranging from 0.960 to 0.982, and MCC values ranging from 0.884 to 0.951 (Table 4). Finally, NB-c (MP + ECFP_6), with the highest Q value of 0.982 and MCC value of 0.951 for the external validation set, was selected as the optimal model and applied for further virtual screening of potential VEGFR2 inhibitors.
| Model | Descriptors | External validation set | |||
|---|---|---|---|---|---|
| SE | SP | Q | MCC | ||
| NB-b | MP + ECFP-4 | 0.939 | 0.976 | 0.967 | 0.911 |
| NB-c | MP + ECFP-6 | 0.963 | 0.988 | 0.982 | 0.951 |
| NB-f | MP + FCFP-4 | 0.890 | 0.980 | 0.960 | 0.884 |
| NB-g | MP + FCFP-6 | 0.896 | 0.992 | 0.970 | 0.914 |
![]() | ||
| Fig. 4 Examples of the top 15 good (A) and bad (B) fragments for VEGFR2 inhibition as estimated by NB-c model. The Bayesian score (Score) is given for each fragment. | ||
By analyzing the 15 good fragments, we noticed that the majority of good fragments contain one or more nitrogen atoms. Some of these nitrogen atoms exist in saturated nitrogen-containing heterocycles, while others are linked with hexatomic rings or carbonyl groups. It is speculated that these nitrogen-containing fragments facilitate the competition of VEGFR2 inhibitors with ATP at the ATP binding-site of VEGFR2 by mimicking the adenine structure in ATP.41 Besides, the nitrogen atoms in these fragments can serve as hydrogen bond acceptors in forming hydrogen bond interactions with the kinase domain of VEGFR2. For example, the crystal structure of VEGFR2 in complex with axitinib shows that one nitrogen of the indazole scaffold forms a hydrogen bond with the N–H group of the cys919.32
It is interesting to notice that 4 out of the 15 bad fragments contain one sulphur atom, which is not found in the good fragments, indicating the existence of sulphur is unfavorable for VEGFR2 inhibition. For the rest 11 bad fragments, though one or more nitrogen atoms are found in 5 fragments, none of these nitrogen atoms is linked with hexatomic ring or carbonyl group, or exist in saturated nitrogen-containing heterocycle. The analysis on the favorable and unfavorable fragments for VEGFR2 inhibition will be beneficial for de novo molecular design and synthesis for novel VEGFR2 inhibitors.
The ligand axitinib was extracted from the VEGFR2 crystal structure (PDB ID: 4AGC), and then re-docked into the active site of VEGFR2. As shown in Fig. 5, the docked ligand poses obtained by the two docking methods can be well aligned with the ligand in the crystallographic complex. The root-mean-square distance (RMSD) values between the docked poses and the corresponding binding pose in the crystallographic complex were 1.05 and 0.43 for LibDock and CDOCKER respectively, indicating the high accuracy and reliability of the docking methods.
![]() | ||
| Fig. 5 Alignment of the docked ligand poses (shown in blue) by LibDock (A) and CDOCKER (B), with the corresponding binding pose (shown in yellow) in the crystallographic complex. | ||
Initially, 120 drugs were predicted to be active against VEGFR2 by NB-c. In the following molecular docking analysis with LibDock, 95 out of the 120 hit drugs by NB-c were successfully docked into the active site of the VEGFR2. The detailed prediction results of the 95 drugs passing through the integrated VS pipeline are summarized in Fig. S1.† Besides, we visualized the predicted results of the 95 drugs and 8 antiangiogenic agents by plotting the EstPGood value to the LibDock Score. As shown in Fig. 6, some drugs possess high probabilities of VEGFR2 inhibitory activities, as indicated by relatively high EstPGood value and LibDock Score. Finally, the 9 top ranked drugs with EstPGood value ≥ 0.6 and LibDock Score ≥ 120 were selected for final biological validation (Table 6).
![]() | ||
| Fig. 7 Inhibitory curves and IC50 values for the reference compound axitinib (A) and three FDA-approved drugs flubendazole (B), rilpivirine (C), papaverine (D) against VEGFR2. | ||
Flubendazole was identified as the most potent inhibitor in this study, with IC50 value of 0.47 μM against VEGFR2. As a synthetic anthelmintic compound, flubendazole has been widely used in human and veterinary medicine against parasitic worms. In recent years, flubendazole has attracted great attention of scientists for its antiproliferative activities against various malignant types including leukemia, myeloma, neuroblastoma, breast cancer, colorectal cancer and melanoma.45–47 However, the antiangiogenic potential of flubendazole has been little studied. The validation of flubendazole as VEGFR2 inhibitor support its potential usage in anticancer therapy. Rilpivirine, a non-nucleoside reverse transcriptase inhibitor, has been approved for the treatment of HIV-1 infection in treatment-naïve adults in combination with other antiretroviral agents.48 By scaffold analyzing of the 3 hit drugs, the scaffold of rilpivirine turned out to be novel scaffold against VEGFR2, which meant the established VS pipeline was capable of identify new VEGFR2 inhibitor with novel scaffold.
Considering the inherent advantages and disadvantages of each single VS method, it is proposed to strengthen the robustness of VS by integrating different VS strategies either in a parallel or sequential manner. Zhang et al. developed an integrated two-layer workflow for VS of VEGFR2 inhibitors by using a 2D fingerprints similarity search followed by molecular docking and then a strict multi-complex pharmacophore screening. The effectiveness of the integrated VS method was then validated by a retrospective comparison, in which it outperformed 43 out of 45 methods.49 In the present study, a hit rate of 33% (3 out of 9) was achieved by the integrated VS pipeline combining ligand-based naïve Bayesian model and structure-based molecular docking. We expect to further apply the integrated VS pipeline to other commercial available compound libraries for potential VEGFR2 inhibitors.
![]() | ||
| Fig. 8 The receptor–ligand interactions of the axitinib (A, B), flubendazole (C, D), rilpivirine (E, F), papaverine (G, H) with the active site of VEGFR2. | ||
While the head group amide substituent of axitinib formed two conventional hydrogen bonds with amino acids Asp1046 and Glu885, its indazole scaffold formed another two hydrogen bonds with Glu917 and Cys919.32 Other interactions of axitinib included Pi–cation interaction with Lys868, Pi–Pi stacked interactions with Phe1047, and Pi–alkyl interactions with Cys1045, Leu1035, Val916, Ala866, Val848, Leu840. As for flubendazole, rilpivirine and papaverine, they all formed Pi–cation interaction with Lys868, and similar Pi–alkyl interactions as axitinib. Moreover, flubendazole formed three conventional hydrogen bonds with Asp1046, Gly922, Cys919 and one additional carbon hydrogen bond with Lys920. Rilpivirine formed one conventional hydrogen bond with Cys919 and one carbon hydrogen bond with Glu917. Papaverine was found to interact with Asp1046 via one conventional hydrogen bond, as well as Cys919, Glu917 and Glu885 via carbon hydrogen bonds.
Footnote |
| † Electronic supplementary information (ESI) available. See DOI: 10.1039/c7ra12259d |
| This journal is © The Royal Society of Chemistry 2018 |