Mengyi Shan‡
a,
Chen Jiang‡ab,
Jing Chenac,
Lu-Ping Qin*a,
Jiang-Jiang Qin*d and
Gang Cheng*a
aCollege of Pharmaceutical Sciences, Zhejiang Chinese Medical University, Hangzhou 310053, People's Republic of China. E-mail: lpqin@zcmu.edu.cn; gangcheng@zcmu.edu.cn
bHangzhou Jingchun Trading Co., Ltd., China
cCollege of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, PR China
dThe Cancer Hospital of the University of Chinese Academy of Sciences, Zhejiang Cancer Hospital, Institute of Basic Medicine and Cancer (IBMC), Chinese Academy of Sciences, Hangzhou 310022, China. E-mail: jqin@ucas.ac.cn
First published on 26th January 2022
Compounds with human ether-à-go-go related gene (hERG) blockade activity may cause severe cardiotoxicity. Assessing the hERG liability in the early stages of the drug discovery process is important, and the in silico methods for predicting hERG channel blockers are actively pursued. In the present study, the directed message passing neural network (D-MPNN) was applied to construct classification models for identifying hERG blockers based on diverse datasets. Several descriptors and fingerprints were tested along with the D-MPNN model. Among all these combinations, D-MPNN with the moe206 descriptors generated from MOE (D-MPNN + moe206) showed significantly improved performances. The AUC-ROC values of the D-MPNN + moe206 model reached 0.956 ± 0.005 under random split and 0.922 ± 0.015 under scaffold split on Cai's hERG dataset, respectively. Moreover, the comparisons between our models and several recently reported machine learning models were made based on various datasets. Our results indicated that the D-MPNN + moe206 model is among the best classification models. Overall, the excellent performance of the DMPNN + moe206 model achieved in this study highlights its potential application in the discovery of novel and effective hERG blockers.
More recently, as the accumulation of data about hERG inhibition of small molecules or their binding affinity to hERG, several large-scale datasets consisting of thousands of hERG blockers and hERG non-blockers are now publicly available.14–16 Utilizing these datasets, many machine learning (ML) algorithms have been utilized for predicting hERG blocking activity. For instance, Hou et al. have developed pharmacophore-based models with Naive Bayes (NB) and SVM, and their best SVM model achieved the prediction accuracy of 84.7% for the training set and 82.1% for the external test set.17 Siramshetty et al. have employed three machine learning methods k-nearest neighbors (kNN), RF, and SVM with different molecular descriptors, and the area under the receiver operating characteristic curve (AUC-ROC) value of the best model reached 0.94.18 Ogura et al. have established an SVM classification model with the descriptor selected by non-dominated Sorting Genetic Algorithm-II (NSGA-II), and the accuracy reached out to 0.984 for their test dataset.19 With the development of the deep learning technique, Cai et al. have collected 7889 compounds with well-defined experimental data on hERG, built multi-task deep learning-based prediction models (deephERG) based on the DeepChem20 open-source platform, and obtained accuracy of 0.93 and AUC-ROC of 0.967 for the best model.21 In addition, multiple other deep learning-based models have been reported, including Capsule Networks and platforms such as deepHit,22 hERG-Att,23 hERG-ml,24 and CardioTox net.25 Despite the above substantial progress in hERG blockers prediction, the accuracy performance needs to be further improved in real drug discovery settings.
Recently, Yang et al. have proposed a Directed Message Passing Neural Network (D-MPNN),26 which showed excellent performances across 19 public datasets, including QM7, QM8, QM9, ESOL, FreeSolv, lipophilicity, BBBP, PDBbind-F, PCBA, BACE, Tox21, and ClinTox. Moreover, it is worth noting that Stokes et al. have successfully applied D-MPNN in identifying novel antibacterial compounds from the ZINC15 database, which strongly proved the superiority of D-MPNN.27 In this study, we applied the D-MPNN model on hERG datasets for predicting the hERG blockers. We tested different combinations of D-MPNN with several fingerprints or descriptors (ECFP4,28 ECFP6,28 FCFP4,28 MACCS,29 PubchemFP,30 RDkit 2D normalized,31 Mol2vec,32 MOE53,33 and moe20634 (206 2D descriptors generated by MOE)), and found that some of these descriptors can significantly boost the performance of D-MPNN. The best combination is D-MPNN incorporated with moe206 descriptors (D-MPNN + moe206), which obtained AUC-ROC values up to 0.956 ± 0.005 and 0.922 ± 0.015 in the five-fold cross-validation on random split and scaffold split, respectively. Various comparisons were also made between our models and those developed by other machine learning methods on their datasets. All the results confirmed that the D-MPNN + moe206 model is one of the best classification models for predicting hERG blockers.
In Chemprop, initially, each SMILES string was converted to their corresponding molecular graph (G) via the open-source package RDKit,35 where the atoms in the chemical structures were described as nodes (v), and the bonds between atoms were considered as edges. Next, node (atom) features xv and edge (bond) features evw were computed. Prior to the first step of message passing, edge hidden states h0vw were initialized with the following equation:
h0vw = τ(Wi cat(xv, evw)) |
hvwt+1 = τ(h0kv + Wmmvwt+1) |
In the readout phase of D-MPNN, the feature vector h for the whole molecule was obtained by summing the hidden states of all atoms.
Finally, the feature vector h is fed through a feed-forward neural network to generate property predictions.
Here, 9 types of molecular descriptors or fingerprints are used to represent the characteristics of the compounds, including mol2vec,32 RDkit 2d normalized,31 MACCS,29 PubChemFP,30 ECFP,28 ECFP6,28 FCFP4,28 MOE53,33 and moe206.34 The first 8 descriptors or fingerprints were computed by RDkit,31 Molmap,36 or mol2vec.32 The last one, moe206, was calculated by Molecular Operating Environment (MOE),34 representing the 206 2D descriptors for each molecule, mainly composed of physicochemical properties, kappa shape index, surface area, atomic and bond number, etc.
The list of descriptors is shown in Table S2.†
True positive (TP): the ratio correctly judged to be positive in all actually positive samples. True negative (TN): a ratio that is correctly predicted to be negative in all actually negative samples. False positive (FP): the rate of being wrongly judged positive in all actually negative samples. False negative (FN): the ratio of false predictions to negative in all actually positive samples.
![]() | ||
Fig. 1 Comparison of the AUC-ROC values of D-MPNN and deephERG model across different decoy thresholds on Cai's validation set. |
Inspired by the excellent performances of the original D-MPNN, we next evaluated the performance of the D-MPNN model coupled with 9 different molecular descriptors (including ECFP, ECFP4, ECFP6, FCFP4, MACCS, PubchemFP, RDkit 2D normalized, MOE53, moe206, mol2vec, MOE53 + mol2vec, and MOE53 + RDkit 2D normalized). At this time, we also used the same total of 7889 compounds from Cai's dataset but applied a five-fold cross-validation mode instead of using the pre-splitted data for training and testing. In the five-fold cross-validation mode, the whole 7889 compounds were divided into five subsets by random select or scaffold split. In each run, four of the five subsets were selected as the training data, and the remaining subset was used as the testing data for evaluating performance. This process was repeated 5 times until each of the five subsets has used as the testing data once. The average testing result from the five runs was used to calculate the final AUC-ROC.
The five-fold cross-validation results based on the random split are given in Fig. 2A. It is believed that k-fold cross-validation38 can effectively avoid overfitting and underfitting. As expected, D-MPNN using the five-fold cross-validation (set as the control in Fig. 2A) showed a significantly poor performance than the above D-MPNN using pre-split data (AUC-ROC: 0.947 ± 0.005 versus 0.996 ± 0.004). And with the incorporation of the molecular characterizations or their combinations, the D-MPNN performances have about 2% fluctuation (AUC-ROC from 0.935 to 0.956). The D-MPNN coupled with moe206 descriptor (D-MPNN + moe206) provides optimal performance with AUC-ROC reaching 0.956 ± 0.005, which is better than the original D-MPNN (AUC-ROC 0.947 ± 0.005). The combination of D-MPNN with FCFP4 fingerprint provided the worst results.
Similarly, all these models with molecular characterizations were tested using five-fold cross-validation under the scaffold split mode (Fig. 2B). In this circumstance, D-MPNN with MOE53, moe206, and RDkit2D obtained slightly higher performance than others, with AUC-ROC reaching from 0.922 to 0.925. Since AUC-ROC on random split has less fluctuation than that on scaffold split, so the five-fold cross-validation random split mode was chosen for further investigation. D-MPNN + moe206 was considered to be the best incorporated descriptor for D-MPNN.
Model | Training set | Test set | AUC-ROC | SE | SP | ACC | |
---|---|---|---|---|---|---|---|
a Doddareddy's dataset.b Siramshetty's dataset.c Not available, this value can NOT be found in the original literature.d Hou's training set I and test set I.e Hou's training set II and test set II.f Ogura's training and test dataset. For each D-MPNN model, the average of different folds (N = 5) and the corresponding standard deviation are listed. | |||||||
1a | SVM + FCFP | D3 training | D3 test | 0.93 | 0.81 | 0.89 | 0.86 |
D-MPNN + moe206 | 0.958 ± 0.005 | 0.900 ± 0.019 | 0.913 ± 0.016 | 0.907 ± 0.010 | |||
D-MPNN | 0.955 ± 0.005 | 0.881 ± 0.032 | 0.907 ± 0.027 | 0.896 ± 0.002 | |||
2b | Consensus model | Training | Validation | NAc | 0.74 | 0.86 | NAc |
D-MPNN + moe206 | 0.864 ± 0.021 | 0.808 ± 0.077 | 0.798 ± 0.039 | 0.798 ± 0.033 | |||
D-MPNN | 0.819 ± 0.012 | 0.638 ± 0.065 | 0.844 ± 0.037 | 0.831 ± 0.031 | |||
3b | Consensus model | Training | FDA-1 | 0.79 | 0.71 | 0.78 | NAc |
D-MPNN + moe206 | 0.882 ± 0.013 | 0.613 ± 0.110 | 0.856 ± 0.023 | 0.835 ± 0.018 | |||
D-MPNN | 0.813 ± 0.032 | 0.413 ± 0.099 | 0.884 ± 0.019 | 0.844 ± 0.013 | |||
4d | SVM | Training I | Test I | 0.842 | 0.907 | 0.652 | 0.821 |
D-MPNN + moe206 | 0.871 ± 0.010 | 0.916 ± 0.014 | 0.667 ± 0.049 | 0.832 ± 0.010 | |||
D-MPNN | 0.776 ± 0.026 | 0.907 ± 0.031 | 0.539 ± 0.065 | 0.783 ± 0.021 | |||
5e | SVM | Training II | Test II | 0.839 | 0.850 | 0.745 | 0.821 |
D-MPNN + moe206 | 0.876 ± 0.015 | 0.890 ± 0.025 | 0.676 ± 0.039 | 0.830 ± 0.010 | |||
D-MPNN | 0.806 ± 0.010 | 0.909 ± 0.037 | 0.553 ± 0.040 | 0.808 ± 0.018 | |||
6f | SVM + 72descriptors + ECFP4 | Training | Test | 0.962 | 0.670 | 0.995 | 0.984 |
D-MPNN + moe206 | 0.968 ± 0.001 | 0.656 ± 0.033 | 0.994 ± 0.001 | 0.983 ± 0.001 | |||
D-MPNN | 0.954 ± 0.001 | 0.627 ± 0.038 | 0.992 ± 0.001 | 0.979 ± 0.000 |
Table 1 entry 1 used the dataset from Doddareddy11 et al., since their representative model performed better on Dataset3 (D3), which contains training set (blockers:1004, non-blockers:1385) and test set (blockers:108, non-blockers: 147). The blockers are defined with IC50 values <10 μM, non-blockers are with IC50 values >30 μM, and the optimal model of Doddareddy et al. is the SVM model with Functional Class Fingerprints (FCFP6), which gave the AUC-ROC of 0.93 and ACC of 0.86 under five-fold cross-validation. Our D-MPNN + moe206 model showed improved prediction accuracy under the exact training and test dataset, achieving AUC-ROC of 0.958 ± 0.005 and ACC of 0.907 ± 0.010, respectively. While the original D-MPNN model gave a slightly poor performance with AUC-ROC of 0.955 ± 0.005 and ACC of 0.896 ± 0.002, confirming moe206 descriptor incorporation is important for boosting the prediction power of D-MPNN.
Table 1 entry 2 listed a result from a recent article by Siramshetty24 et al. They combined the best models of RF + RDKit, XGBoost + RDKit, DNN + MorganFP, and long short-term memory (LSTM) + ATN-SMILES with their own merits into a consensus model, providing superior performance. Siramshetty's dataset consists of the training set (blockers: 2164, non-blockers: 5990), prospective validation set (threshold = 30 μM, blockers: 53, non-blockers: 786), and FDA-approved drugs (thresholds = 1 μM, blockers: 15, non-blockers: 162; thresholds = 10 μM, blockers: 46, non-blockers: 131), where the training set comes from Public Domain hERG Bioactivity (ChEMBL) using a binary threshold (1 and 10 μM) and thallium flux assay (NCATS) using a threshold of 30 μM. We run the D-MPNN + moe206 model on the same training dataset as Siramshetty's consensus model and then evaluated performance on Siramshetty's validation dataset and FDA-1 dataset. As shown in Table 1 entry 2, on prediction of Siramshetty's Validation dataset D-MPNN + moe206 performs better in SE (0.808 ± 0.077 versus 0.77), while Siramshetty's consensus model has better SP (0.798 ± 0.039 versus 0.86). For prediction of the FDA-1 dataset (Table 1 entry 3), D-MPNN + moe206 performs much better in AUC-ROC (0.882 ± 0.013 versus 0.79) and SP (0.856 ± 0.023 versus 0.78), but not so good as the consensus model in SE (0.613 ± 0.110 versus 0.71).
The models of entries 4 and 5 in Table 1 are based on Hou's dataset,17 which contains training set I (blockers: 352, non-blockers: 40), training set II (blockers: 352, non-blockers: 40), test set I (blockers: 175, non-blockers: 20), and test set II (blockers: 175, non-blockers: 20), blockers and non-blockers are distinguished with a threshold of 40 μM. According to Hou's report,17 the performance of the optimal model SVM reaches the highest AUC-ROC of 0.842 and 0.839 for test I and test II, respectively. Nevertheless, our D-MPNN + moe206 model showed more competitive and superior performance, which gave an AUC-ROC of 0.871 ± 0.010, ACC of 0.832 ± 0.010 for test I, and an AUC-ROC of 0.876 ± 0.015, ACC of 0.830 ± 0.010 for test II.
Entry 6 of Table 1 used Ogura's datasets,19 which is the largest hERG database so far. It contains a training set (blockers: 6923, non-blockers: 196918) and a test set (blockers: 2966, non-blockers: 84
395), and as per the criteria of IC50 values ≤10 μM considered to be hERG blockers and IC50 values >10 μM considered to be hERG non-blockers. A prediction model based on the non-dominant sorting genetic algorithm (NSGA-II) was constructed to obtain the maximum prediction performance with 72 selected descriptors and ECFP4 structure fingerprint, reaching an ACC value of 0.984, respectively. D-MPNN + moe206 achieves comparable performance with AUC-ROC of 0.968 ± 0.001 and ACC value of 0.983 ± 0.001 based on the same training and test dataset as Ogura's model.
Very recently Karim et al. proposed a robust predictor based on deep learning meta-feature ensembles called CardioTox net, which showed excellent performance in various classification indexes for three external test sets.25
We tested the performance of D-MPNN + moe206 by comparison with CardioTox using Karim's25 datasets in Table 2, which contains training set (blockers: 6643, non-blockers: 5977), test set-I (blockers: 30, non-blockers: 14), test set-II (blockers: 11, non-blockers: 30), and test set-III (blockers: 30, non-blockers: 710). The blockers are defined with IC50 values <10 μM, and the non-blockers are with IC50 values ≥ 10 μM. As shown in Table 2 entries 1 and 3, D-MPNN + moe206 is comparable to CardioTox on all three dataset. Although D-MPNN + moe206 performance is slightly inferior to CardioTox on test set-I and test set-III which achieved MCC of (0.567 ± 0.061 versus 0.599, 0.214 ± 0.016 versus 0.220) and ACC of (0.800 ± 0.030 versus 0.810, 0.696 ± 0.030 versus 0.746), D-MPNN + moe206 showed improved performance for all metrics except ACC on test set-II, which reaches MCC of 0.470 ± 0.053, NPV of 0.950 ± 0.032, PPV of 0.467 ± 0.020, SP of 0.620 ± 0.030, SE of 0.909 ± 0.064, and B-ACC of 0.764 ± 0.030.
Model | Evaluation data | AUC-ROC | MCC | NPV | ACC | PPV | SP | SE | B-ACC | |
---|---|---|---|---|---|---|---|---|---|---|
a Karim's dataset.b Not available. For each D-MPNN model, the average of different folds (N = 5) and the corresponding standard deviation are listed. | ||||||||||
1a | CardioTox | Test set-I | NAb | 0.599 | 0.688 | 0.810 | 0.893 | 0.786 | 0.833 | 0.810 |
D-MPNN + moe206 | 0.849 ± 0.042 | 0.567 ± 0.061 | 0.656 ± 0.044 | 0.800 ± 0.030 | 0.890 ± 0.023 | 0.786 ± 0.051 | 0.807 ± 0.037 | 0.796 ± 0.031 | ||
2a | CardioTox | Test set-II | NAb | 0.452 | 0.947 | 0.755 | 0.455 | 0.600 | 0.909 | 0.754 |
D-MPNN + moe206 | 0.810 ± 0.055 | 0.470 ± 0.053 | 0.950 ± 0.032 | 0.698 ± 0.022 | 0.467 ± 0.020 | 0.620 ± 0.030 | 0.909 ± 0.064 | 0.764 ± 0.030 | ||
3a | CardioTox | Test set-III | NAb | 0.220 | 0.986 | 0.746 | 0.113 | 0.698 | 0.794 | 0.746 |
D-MPNN + moe206 | 0.830 ± 0.010 | 0.214 ± 0.016 | 0.986 ± 0.037 | 0.696 ± 0.030 | 0.110 ± 0.006 | 0.692 ± 0.033 | 0.788 ± 0.064 | 0.740 ± 0.020 |
According to these comparative analyses, the D-MPNN + moe206 model the D-MPNN + moe206 model showed excellent AUC-ROC, accuracy, MCC, sensitivity, and NPV across different datasets given here, providing one of the best classification tools for predicting potential hERG-induced cardiotoxicity of compounds.
We first evaluated the performance of XGBoost based on the moe206 descriptor and ECFP4 fingerprint on Cai's hERG dataset. The results showed that the AUC-ROC of XGBoost on moe206 descriptor is only slightly lower than that of D-MPNN + moe206 (0.950 ± 0.006 versus 0.956 ± 0.005) and significantly better than AUC-ROC of XGBoost on ECFP4 descriptor (0.950 ± 0.006 versus 0.933 ± 0.001) (Fig. 3A). This further confirmed the importance of the moe206 descriptor.
Next, we did SHAP analysis on the moe206 descriptor. As shown in Fig. 3B, h_pavgQ, logP(o/w), SlogP, vsa_pol, h_logD, and h_pkB are considered to be the six most important descriptors of the model. The higher h_pavgQ, logP(o/w), SlogP and h_logD, and the lower vsa_pol and h_pkB (The list of descriptors is shown in Table S2†), the more likely it is to be predicted as a blocker. According to Honma's research,45 positively charged atoms are the main distinguishing factor. Approximately 80% of hERG non-blockers do not have but more than half of the inhibitors contain at least one. The h_pavgQ at the top of the list not only reflects this feature but also reflects the superiority of the model in terms of feature selection. Similarly, the logP (o/w), SlogP and h_logD (ranked second, third and fifth) are related to Log octanol/water partition coefficient, representing those fat-soluble fragments or aromatic heterocycles in the blockers form a Π-stacking interaction with Phe-656 and Tyr-652,46 vsa_pol is also related to the hydrophobicity to some extent. Regarding pKb, on account of a number of hERG blockers have an aromatic ring at the end of the ligand and a basic amine, causing it to protonate readily at physiological pH and creating with cation–π interaction with Tyr652.47 In short, the top-ranked descriptors are consistent with the characteristics reported in previous studies that most of the hERG blockers contained positively charged atom, more lipophilic, and more alkaline.2,48
Compared to D-MPNN without any descriptors, D-MPNN carries some descriptor prediction performance is even worse, some descriptor values may stay the same, or some descriptors are meaningless to reflect molecular characteristics, etc. It may be due to the fact that some descriptors cannot be used to classify blockers and non-blockers significantly, leading to ineffective work of deep learning, which requires us to intervene in the selection of descriptors. Thus, the complexity of the model is reduced and the generalization ability of the model is improved, and the computational resources and time are saved.50
Footnotes |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/d1ra07956e |
‡ Those two authors contribute equally. |
This journal is © The Royal Society of Chemistry 2022 |