Open Access Article
Zemin Zhua,
Ziaur Rahmana,
Muhammad Aamira,
Syed Zahid Ali Shah
b,
Sattar Hamidc,
Akhunzada Bilawald,
Sihong Lie and
Muhammad Ishfaq
*a
aCollege of Computer Science, Huanggang Normal University, Huanggang 438000, China. E-mail: muhammad@hgnu.edu.cn; Tel: +86 15972855212
bDepartment of Pathology, Faculty of Veterinary and Animal Sciences, The Islamia University of Bahawalpur-Pakistan, Pakistan
cThe University of Agriculture Peshawar, Khyber Pakhtunkhwa, 25130, Pakistan
dCollege of Food Science, Northeast Agricultural University, Harbin, China
eKey Laboratory of Applied Technology on Green-Eco-Healthy Animal Husbandry of Zhejiang Province, College of Animal Science and Technology, College of Veterinary Medicine, Zhejiang A&F University, Hangzhou 311300, China
First published on 11th January 2023
Mycoplasma pneumoniae (MP) is one of the most common pathogenic organisms causing upper and lower respiratory tract infections, lung injury, and even death in young children. Toll-like receptors (TLRs) play an important role in innate immunity by allowing the host to recognize pathogens invading the body. Previous studies demonstrated that TLR4 is a potential therapeutic target for the treatment of MP pneumonia. Therefore, the present study aimed to screen biologically active ingredients that target the TLR4 receptor pathway. We first used molecular docking to screen out the active compounds inhibiting the TLR4 pathway, and then used regression and classification machine learning algorithms to establish a quantitative structure–activity relationship (QSAR) model to predict the biological activity of the screened compounds. A total of 78 molecules were used in QSAR modelling, which were retrieved from the ChEMBL database. The QSAR models had acceptable correlation coefficients of R2 on the training and testing dataset in the range of 0.96 to 0.91 and 0.93 to 0.76, respectively. The multiclass classification models showed accuracy on training and testing data within ranges of 1.0 to 0.70, 0.96 to 0.63, and log loss ranges from 0.27 to 8.63, respectively. In addition, molecular descriptors and fingerprints have been studied as structural elements involved in increased and decreased inhibitory activities. These results provide a quantitative analysis of QSAR and classification models applicable for high-throughput screening, as well as insights into the mechanisms of inhibition of TLR4 antagonists.
Recently, more and more active compounds in natural products have been found to inhibit the TLR4 pathway. Studies revealed that natural products are rich in molecules that possess the potential to inhibit the TLR4 protein and have attracted the attention of researchers.12–15 Furthermore, TLR4 inhibition mediated by small molecules led to an array of research focusing on the molecular mechanism of action of TLR4 inhibitors.16 However, future studies are needed to confirm these findings. Besides, the emergence of antibiotic resistance represents another challenge in the context of the treatment of MP infections.17 It is therefore crucial to find alternatives to antibiotics to prevent the emergence of resistance against anti-mycoplasma drugs. Hence, active ingredients of natural products could be used to modulate the host immune inflammatory response. The active ingredients of natural products require more experimental data on its toxicology and pharmacology before its use in clinical trials, which is a lengthy process. To screen TLR4 inhibitors in a limited period of time, it is necessary to use fast, accurate, and reliable screening methods based on detailed study of TLR4 inhibition and regulation. In this context, modern machine learning technologies make better use of information obtained from several sources to predict the bioactivities of drugs for several diseases, thus facilitating the discovery of new drugs more efficiently. In recent years, computer-aided drug screening is advancing towards practicality and is emerging as a core technology for innovative drug research. Several drug libraries were screened in a very short time-period, leading to the discovery of many active compounds in traditional Chinese medicines and the successful repurposing of several approved drugs.18,19 Based on previous research, virtual screening paved the way for the future development of improved chemical analogs for use in treating a wide variety of human and animal diseases through medicinal chemistry structure–activity relationships and drug screenings.20 Molecular docking and quantitative structure activity relationship (QSAR) models provide structural information and insight into TLR4 inhibitors that can be used to guide more effective drug development, including screening and rational drug discovery of TLR4 inhibitors. Therefore, the objective of the present study was to identify lead compounds that can inhibit the TLR4 protein for the treatment of MP pneumonia. The regression and classification QSAR models were developed from a set of known chemical TLR4 inhibitors. These QSAR models will be used to predict and classify the bioactive compounds based on their predicted bioactivity (pIC50) values and provide theoretical foundations to enable the development of potent drugs from natural products for the prevention and treatment of MP-pneumonia. The flow chart for the experimental process is shown in Fig. 1.
10
(IC50). Here, pIC50 is the negative logarithm of IC50. As this study is primarily concerned with developing regression and classification models of biological activity. Thus, the IC50 values were divided into three classes (high (<1 μM), moderate (≤3 μM and >1 μM) and low (>3 μM)) for a clear distinction between the potency of these compounds. The geometry is optimized using MOPAC (Molecular Orbital Package) using the AM1 method, the 3D coordinates are preserved, and the energy is minimized in Merck Molecular Force Field (MMFF94) for all ligands before descriptors and molecular fingerprints are calculated. Molecular descriptors were calculated, and the data was curated based on the calculated descriptors. Model building relies on qualitative and/or quantitative chemical information that can be obtained from molecular fingerprints. The PaDel molecular fingerprints and descriptors were calculated using ChemDes web-based platform.31 To clean the initial data collected from the ChEMBL database, data preprocessing has been performed as described previously.32–35 As part of the data curation process, some key steps are performed including (i) structural cleaning and conversion, (ii) removal of duplicates, (iii) removal of mixtures and inorganics, (iv) normalization of specific chemotypes, and (v) manual verification of the data. All other inhibitory targets except for TLR4, as well as the pIC50 values for each, were removed from the data. Molecules with missing values for the pIC50 and SMILES notation, as well as duplicate value entries, were removed. The feature selection process is also known as variable selection or attribute selection. Predictive modeling involves automatic selection of attributes most relevant to the problem. As a first step, all the descriptors and molecular fingerprints were checked manually for any missing values, and all the columns containing zero values were removed from the files using a Python script. The fingerprints were then combined into a single csv (Comma Separated Values) file. A variation selection method was initially applied in order to remove the redundant features from the list of combined features.
| S. No. | Drug names | MW | Docking score | RFR model | ETR model | DTR model | ABR model | GBR model |
|---|---|---|---|---|---|---|---|---|
| 1 | (R)-2-(3-(3-Carbamoyl-5-methylphenylsulfonamido)tetrahydrofuran-3-yl)acetic acid | 342.37 | −5.228 | 5.430829277 | 6.314821411 | 6.958607315 | 4.813787229 | 5.136554754 |
| 2 | N-(2-Oxo-2-((6R)-4-oxo-3,9-diazabicyclo[4.2.1]nonan-9-yl)ethyl)-1H-indole-2-carboxamide | 340.38 | −5.127 | 4.815372966 | 6.16129079 | 4.978810701 | 4.644380954 | 4.864479656 |
| 3 | 4-(N-(2-Carbamoylphenyl)sulfamoyl)-3-fluorobenzoic acid | 338.31 | −5.014 | 5.131434015 | 5.942968037 | 4.779891912 | 4.644380954 | 4.848622926 |
| 4 | (R)-3-(5-Oxo-2,5-dihydro-1H-1,2,4-triazol-3-yl)-N-((3-(trifluoromethyl)-1H-1,2,4-triazol-5-yl)methyl)piperidine-1-carboxamide | 360.30 | −4.888 | 5.335803885 | 4.311301872 | 5 | 5.128070842 | 5.154233598 |
| 5 | 1-(((1S,2R)-2-Hydroxy-1,2,3,4-tetrahydronaphthalen-1-yl)carbamoyl)cyclopent-3-enecarboxylic acid | 301.34 | −4.857 | 5.162769848 | 5.498993506 | 5.229147988 | 5.853871964 | 5.136223419 |
| 6 | 3-(3-Amino-5-methylisoxazole-4-sulfonamido)-4-methoxybenzoic acid | 327.31 | −4.841 | 5.42557839 | 5.515008442 | 7.096910013 | 6.198970004 | 4.768100468 |
| 7 | 5-(2-((2-Aminoethyl)amino)thiazol-4-yl)-2-hydroxybenzamide | 278.33 | −4.788 | 5.062946566 | 4.451147788 | 6.795880017 | 4.676057904 | 4.715750754 |
| 8 | (R)-3-((4-Amino-6,7-dimethoxyquinazolin-2-yl)amino)-2-hydroxy-2-methylpropanoic acid | 322.32 | −4.765 | 5.384378737 | 4.723592771 | 5 | 5.551578992 | 5.318842099 |
| 9 | 4-((3-(1-(3-Methylbutanoyl)piperidin-4-yl)ureido)methyl)benzoic acid | 361.44 | −4.729 | 5.352768581 | 4.604814799 | 6.958607315 | 5.128070842 | 4.930009052 |
| 10 | (S)-3-(5-Fluoropyridine-3-sulfonamido)-2-hydroxypropanoic acid | 264.23 | −4.626 | 5.469392305 | 5.131277132 | 6.958607315 | 5.857242359 | 5.598004671 |
| 11 | 3,4-Difluoro-N-(2-((2R,4R)-4-hydroxy-2-(hydroxymethyl)pyrrolidin-1-yl)-2-oxoethyl)benzamide | 314.29 | −4.559 | 5.303713126 | 4.866537918 | 6.958607315 | 5.121293699 | 4.907629008 |
| 12 | (S)-3-(2,3-Dichlorophenylsulfonamido)-2-hydroxypropanoic acid | 314.14 | −4.530 | 5.448314838 | 4.792741186 | 6.958607315 | 5.121293699 | 4.849140284 |
| 13 | 2-(((3R,4R)-3-Methyltetrahydro-2H-pyran-4-yl)amino)-5-sulfamoylbenzoic acid | 314.36 | −4.524 | 5.737739993 | 4.597185429 | 6.795880017 | 4.644380954 | 5.211471563 |
| 14 | (R)-2-(3-Methyl-1H-1,2,4-triazol-5-yl)-N-(4-oxochroman-3-yl)acetamide | 286.29 | −4.508 | 5.198586102 | 4.835972975 | 6.795880017 | 5.121293699 | 4.841100257 |
| 15 | 4-(N-(2-Amino-2-oxoethyl)-N-benzylsulfamoyl)-1H-pyrrole-2-carboxylic acid | 337.35 | −4.507 | 4.93587296 | 5.078963707 | 4.256568635 | 4.644380954 | 5.278315448 |
| 16 | (R)-4-Isopropoxy-3-(1-(tetrahydrofuran-3-yl)-1H-pyrazole-4-sulfonamido)benzoic acid | 395.43 | −4.504 | 5.659485927 | 5.673346573 | 5 | 4.676057904 | 5.283553443 |
| 17 | 3-(N-(5-Cyano-2-(methylamino)phenyl)sulfamoyl)-5-fluorobenzoic acid | 349.34 | −4.489 | 5.2393042 | 6.270021811 | 4.586700236 | 4.676057904 | 5.202654077 |
| 18 | 1-((2-(5-Fluoro-1H-indol-3-yl)ethyl)carbamoyl)azetidine-3-carboxylic acid | 305.30 | −4.463 | 5.138107357 | 5.712863539 | 6.958607315 | 5.121293699 | 4.859899139 |
| 19 | 4-(3-Ethylphenylsulfonamido)-3-hydroxybenzoic acid | 321.35 | −4.456 | 5.326559803 | 5.193816275 | 5 | 5.266000713 | 5.426620204 |
| 20 | (R)-3-(3-(5,6-Dimethyl-4-oxo-1,4-dihydrothieno[2,3-d]pyrimidin-2-yl)propanamido)-2-hydroxy-2-methylpropanoic acid | 353.39 | −4.450 | 5.372020827 | 5.783184637 | 5 | 5.121293699 | 5.442711298 |
| 21 | (2R,3S,4R)-1-(Tert-butoxycarbonyl)-3,4-dihydroxypyrrolidine-2-carboxylic acid | 247.25 | −4.443 | 5.469071944 | 5.613893366 | 5 | 5.121293699 | 4.953810874 |
| 22 | 2,4-Difluoro-N-(2-((2R,4R)-4-hydroxy-2-(hydroxymethyl)pyrrolidin-1-yl)-2-oxoethyl)benzamide | 314.29 | −4.411 | 5.278958496 | 4.61805273 | 6.958607315 | 4.813787229 | 4.931341817 |
| 23 | (3R,5R)-1-(6-(((3-Cyclopropyl-1H-pyrazol-5-yl)methyl)amino)pyrimidin-4-yl)-5-((dimethylamino)methyl)pyrrolidin-3-ol | 357.45 | −4.399 | 5.284046701 | 4.564151855 | 6.602059991 | 5.121293699 | 5.125098228 |
| 24 | (R)-2-(1-Oxo-1,2-dihydroisoquinoline-3-carboxamido)-3-phenylpropanoic acid | 336.34 | −4.386 | 5.163010804 | 5.302824295 | 5.853871964 | 5.595633098 | 5.484515313 |
| 25 | (R)-4-(N-(2-Oxo-2-((tetrahydro-2H-pyran-3-yl)amino)ethyl)sulfamoyl)benzoic acid | 342.37 | −4.380 | 5.820985738 | 5.221450444 | 6.958607315 | 5.035830554 | 5.35223672 |
| 26 | (R)-2-(6,7-Dihydro-5H-pyrrolo[1,2-a]imidazole-3-sulfonamido)-2-(3-methoxyphenyl)acetic acid | 351.38 | −4.365 | 5.275202795 | 5.291271987 | 5 | 4.676057904 | 5.012575341 |
| 27 | (R)-N-(1-Amino-3-methoxy-1-oxopropan-2-yl)-7-methyl-1H-indole-2-carboxamide | 275.30 | −4.342 | 5.130831094 | 5.946522112 | 6.795880017 | 5.121293699 | 4.992894312 |
| 28 | N-((2R,3R)-4-Hydroxy-3-(methylthio)butan-2-yl)-2-oxo-2,3-dihydrobenzo[d]oxazole-6-sulfonamide | 332.40 | −4.339 | 5.36250487 | 5.963766686 | 6.958607315 | 4.813787229 | 4.866545583 |
| 29 | 1-(2-Morpholino-2-oxoethyl)-3-(pyridin-3-yl)urea | 264.28 | −4.278 | 5.323176653 | 4.938421827 | 6.795880017 | 5.121293699 | 4.845185655 |
| 30 | (3R,4R)-1-((3-Carbamoylphenethyl)carbamoyl)-4-methylpiperidine-3-carboxylic acid | 333.38 | −4.260 | 5.316126193 | 4.52083702 | 6.476253533 | 5.121293699 | 4.632121561 |
P, number of hydrogen bond donors and number of hydrogen bond acceptors of a compound and its degree of potency and pIC50 values.44 In order to visualize the relative distribution of the bioactivity classes and Ro5, scatter and box plots were created, as shown in Fig. 3(A–G). The results showed that 60% of compounds have molecular weight of less than 500 Da (Fig. 3(G)). Whereas the ALog
P for the majority of compounds varies between 2 and 7. Molecular lipophilicity can be measured by ALog
P where a high ALog
P value indicates high lipophilicity whereas a low value suggests low lipophilicity. The ALog
P is a computational estimator of the logarithm of the partition coefficient between water and octanol, which has been indispensable in determining molecular hydrophobicity. The boxplots indicate distribution and frequency of high-, moderate- and Low-class compounds over the Ro5 (Fig. 3 (A–D)). The Ro5 index may be used to distinguish compounds based on their pharmacological effects based on their molecular properties, namely the octanol–water partition coefficient (logP < 5), the molecular weight (<500), the number of hydrogen bond donors (>5), and the number of hydrogen bond acceptors (<10). The Ro5 were found to be of limited use in contributing to our understanding of the targets–ligands relationship (i.e., their affinity towards the target) as they were based solely on general ligand properties. Oprea et al. demonstrated that the Ro5 criteria are not effective in discriminating between drugs and non-drugs based on the availability of more than 90% of the chemical reagents listed in the Available Chemical Directory that meets Ro5 criteria.45 The Ro5 criteria, however, do not eliminate the possibility that they may be used to narrow the pharmacokinetic space for therapeutically relevant compounds. In addition, Benet et al. have demonstrated that a QSAR model developed using the Ro5 criteria can effectively predict drug disposition characteristics for drugs that meet or do not meet the Ro5.46
![]() | ||
Fig. 3 Panels (A–F) show exploratory TLR4 inhibitors data analysis and panel (G) shows chemical space analysis. The scatter plot showed the diversity of ALog P versus MW of TLR4 inhibitory compounds. | ||
| Models | R2 (train) | RMSE (train) | MAE (train) | R2 (test) | RMSE (test) | MAE (test) | R2 (CV) | RMSE (CV) | MAE (CV) |
|---|---|---|---|---|---|---|---|---|---|
| RFR | 0.91 | 0.36 | 0.25 | 0.89 | 0.39 | 0.3 | 0.71 | 0.65 | 0.68 |
| ETR | 0.96 | 0.23 | 0.06 | 0.76 | 0.53 | 0.41 | 0.76 | 0.56 | 0.41 |
| DTR | 0.96 | 0.22 | 0.04 | 0.82 | 0.53 | 0.42 | 0.63 | 0.69 | 0.51 |
| ABR | 0.91 | 0.36 | 0.26 | 0.89 | 0.39 | 0.29 | 0.74 | 0.59 | 0.42 |
| GBR | 0.96 | 0.23 | 0.06 | 0.93 | 0.29 | 0.21 | 0.79 | 0.51 | 0.36 |
| Accuracy (train) | RMSE (train) | MAE (train) | Accuracy (test) | RMSE (test) | MAE (test) | Accuracy (CV) | RMSE (CV) | MAE (CV) | Log loss |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 0 | 0 | 0.96 | 0.2 | 0.04 | 0.82 | 0.46 | 0.19 | 0.27 |
| Precision | Recall | F1-score | Support | |
|---|---|---|---|---|
| Classification report of RF model on test data | ||||
| 0 | 0.888889 | 1 | 0.941176 | 8 |
| 1 | 1 | 0.933333 | 0.965517 | 15 |
| 2 | 1 | 1 | 1 | 1 |
| Accuracy | 0.958333 | 0.958333 | 0.958333 | 0.958333 |
| Macro avg. | 0.962963 | 0.977778 | 0.968898 | 24 |
| Weighted avg. | 0.962963 | 0.958333 | 0.95884 | 24 |
![]() |
||||
| Classification report of RF model on cross validation data | ||||
| 0 | 0.736842 | 0.777778 | 0.756757 | 18 |
| 1 | 0.844828 | 0.924528 | 0.882883 | 53 |
| 2 | 1 | 0.142857 | 0.25 | 7 |
| Accuracy | 0.820513 | 0.820513 | 0.820513 | 0.820513 |
| Macro avg. | 0.860557 | 0.615054 | 0.62988 | 78 |
| Weighted avg. | 0.833834 | 0.820513 | 0.79698 | 78 |
| Models | Accuracy (train) | RMSE (train) | MAE (train) | Accuracy (test) | RMSE (test) | MAE (test) | Log loss |
|---|---|---|---|---|---|---|---|
| RF model | 1 | 0 | 0 | 0.96 | 0.2 | 0.04 | 0.27 |
| KNeighbors classifier | 0.76 | 0.64 | 0.3 | 0.75 | 0.5 | 0.25 | 5.95 |
| SVC | 0.7 | 0.54 | 0.29 | 0.63 | 0.61 | 0.37 | 0.87 |
| Decision-tree classifier | 1 | 0 | 0.35 | 0.88 | 0.35 | 0.13 | 4.31 |
| AdaBoost classifier | 0.98 | 0.14 | 0.02 | 0.92 | 0.46 | 0.13 | 0.54 |
| Gradient boosting classifier | 1 | 87.5 | 0 | 0.88 | 0.35 | 0.13 | 0.97 |
| Linear discriminant analysis | 0.98 | 0.27 | 0.04 | 0.71 | 0.54 | 0.29 | 5.79 |
| Quadratic discriminant analysis | 1 | 0 | 0 | 0.75 | 0.5 | 0.25 | 8.63 |
Footnote |
| † Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d2ra06178c |
| This journal is © The Royal Society of Chemistry 2023 |