Issue 46, 2023

Machine learning method aided discovery of the fourth-generation EGFR inhibitors

Abstract

Epidermal growth factor receptor (EGFR) mutations are identified as driver mutations in non-small cell lung cancer (NSCLC), but drug resistance is the key issue. With third-generation EGFR inhibitors having been used for treatment for a longer period of time, designing potent EGFR inhibitors that overcome drug resistance is a crying need, in which the fourth-generation EGFR inhibitors are very promising. In this work, classification models and regression models were constructed to assist in the discovery of the fourth-generation EGFR inhibitors. By using a combination of eight machine-learning (ML) approaches and three strategies, presently, 24 classification models for distinguishing whether it is an EGFR inhibitor were constructed. Among these models, the SVM model exhibits the best performance, with accuracy (ACC), ROC area under the curve (ROC) and Matthews correlation coefficient (MCC) values at 95.5%, 92.4% and 84.7% for the external validation set, respectively. In addition, after using recursive feature elimination (RFE), an efficient approach for feature filtering, to screen the high-dimensional and massive molecular descriptors, 10 regression models including 5 single models and 5 combined models for estimating the inhibitory potency were built. The combined model RF-RFE-SVM shows the best prediction capacity with Rtest2 = 0.93. With the attempt to analyze the contribution of features to models, the SHapley Additive explanation (SHAP) method was also adopted when interpreting the obtained models. Thereafter, based on the feature importance, compounds were selected to construct pharmacophore models and for molecular docking, for further studying the key pharmacodynamic characteristics (hydrogen bonding acceptor for an sp2 hybridized oxygen atom and an alkyl-type hydrophobic group) as well as the interactions (hydrogen bonding interactions and hydrophobic interactions) between the inhibitors and the EGFR protein, respectively. Collectively, the findings support the discovery of lead compounds of the fourth-generation EGFR inhibitors, highlighting a strong potential of machine learning in drug discovery.

Graphical abstract: Machine learning method aided discovery of the fourth-generation EGFR inhibitors

Supplementary files

Article information

Article type
Paper
Submitted
09 Jul 2023
Accepted
04 Nov 2023
First published
06 Nov 2023

New J. Chem., 2023,47, 21513-21525

Machine learning method aided discovery of the fourth-generation EGFR inhibitors

Y. Zhang and Y. Li, New J. Chem., 2023, 47, 21513 DOI: 10.1039/D3NJ03204C

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements