Identification of multi-transcriptomic prognostic biomarkers to explore natural therapeutics for lung cancer integrating machine learning

Abstract

Lung cancer remains the leading cause of cancer-related mortality worldwide, underscoring the urgent need for novel therapeutic strategies. Cyclin-dependent kinase 1 (CDK1), a central cell-cycle regulator, has emerged as an oncogenic driver and potential target in lung adenocarcinoma. This study aimed to integrate transcriptomics, machine learning (ML), and advanced in silico approaches to identify natural product-derived potential inhibitors targeting CDK1. To identify robust differentially expressed genes, first we analyzed four different datasets (GSE19804, GSE10072, GSE18842, and GSE10799). Protein–protein interaction network and topological analysis highlighted CDK1 as a primary key hub gene (pKHG) enriched in cell-cycle and p53 pathways. Target validation confirmed CDK1 overexpression, prognostic significance, immune infiltration links, and mutation associations. In addition, a collected library of 9667 natural phytocompounds was reduced through ML-based bioactivity (pIC50) prediction targeting pKHG to discover potential lead molecules. Then, the selected top lead molecules were considered for further evaluation via molecular docking, molecular dynamics simulations, ADMET analysis, and binding free-energy calculations (MM-GBSA). Among the selected phytochemicals, CID_14218027 (−7.69 kcal mol−1), CID_487089 (−6.80 kcal mol−1), and CID_174880 (−6.70 kcal mol−1) showed the highest binding affinity score (GLIDE_XP score) and stable molecular interactions. Furthermore, MD simulations confirmed the conformational stability of ligand–protein complexes, supporting their potential as CDK1 inhibitors. This integrated omics-to-in silico pipeline identifies CDK1 as a robust therapeutic target and highlights natural product-derived inhibitors with favorable pharmacological and physicochemical properties. Therefore, these findings present a viable framework for accelerating precision drug discovery, with experimental validation underway. However, these findings are based solely on computational analyses and require further experimental validation to confirm CDK1 inhibitory activity, anticancer efficacy, and safety.

Graphical abstract: Identification of multi-transcriptomic prognostic biomarkers to explore natural therapeutics for lung cancer integrating machine learning

Supplementary files

Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article.

View this article’s peer review history

Article information

Article type
Paper
Submitted
27 Jan 2026
Accepted
24 Apr 2026
First published
29 Apr 2026
This article is Open Access
Creative Commons BY license

Digital Discovery, 2026, Advance Article

Identification of multi-transcriptomic prognostic biomarkers to explore natural therapeutics for lung cancer integrating machine learning

M. A. Ali, H. Sarker, M. Kamrun, H. Sheikh, B. A. Shifa, S. Ahmed, T. Islam, S. Banik and N. Kumar, Digital Discovery, 2026, Advance Article , DOI: 10.1039/D6DD00045B

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements