In silico prediction of hERG potassium channel blockage by chemical category approaches†
The human ether-a-go-go related gene (hERG) plays an important role in cardiac action potential. It encodes an ion channel protein named Kv11.1, which is related to long QT syndrome and may cause avoidable sudden cardiac death. Therefore, it is important to assess the hERG channel blockage of lead compounds in an early drug discovery process. In this study, we collected a large data set containing 1163 diverse compounds with IC50 values determined by the patch clamp method on mammalian cell lines. The whole data set was divided into 80% as the training set and 20% as the test set. Then, five machine learning methods were applied to build a series of binary classification models based on 13 molecular descriptors, five fingerprints and molecular descriptors combining fingerprints at four IC50 thresholds to discriminate hERG blockers from nonblockers, respectively. Models built by molecular descriptors combining fingerprints were validated by using an external validation set containing 407 compounds collected from the hERGCentral database. The performance indicated that the model built by molecular descriptors combining fingerprints yielded the best results and each threshold had its best suitable method, which means that hERG blockage assessment might depend on threshold values. Meanwhile, kNN and SVM methods were better than the others for model building. Furthermore, six privileged substructures were identified using information gain and frequency analysis methods, which could be regarded as structural alerts of cardiac toxicity mediated by hERG channel blockage.