Identification of Multi-Transcriptomic Prognostic Biomarkers to Explore Natural Therapeutics for Lung Cancer integrating Machine Learning

Abstract

Lung cancer remains the leading cause of cancer-related mortality worldwide, underscoring the urgent need for novel therapeutic strategies. Cyclin-dependent kinase 1 (CDK1), a central cell-cycle regulator, has emerged as an oncogenic driver and potential target in lung adenocarcinoma. This study aimed to integrate transcriptomics, machine learning (ML), and advanced in silico approaches to identify natural productderived potential inhibitors targeting CDK1. To identify robust differentially expressed genes, first we considered four microarray datasets (GSE19804, GSE10072, GSE18842, GSE10799). Protein-protein network interaction and topological analysis highlighted CDK1 as a primary key hub gene (pKHG) enriched in cell-cycle and p53 pathways. Target validation confirmed CDK1 overexpression, prognostic significance, immune infiltration links, and mutation associations. In addition, the collected naturally sourced phytochemical library of 9,667 was reduced through ML and cheminformatics-based bioactivity (pIC50) prediction to discover potential lead molecules against CDK1. Then, the selected top lead molecules were considered for further evaluation via molecular docking, molecular dynamics simulations, ADMET analysis, and binding free-energy calculations (MM-GBSA). Among the selected phytochemicals, licoflavanone (-8.25 kcal/mol), 3-hydroxyglabrol (-8.22 kcal/mol), and wighteone (-7.34 kcal/mol) showed the highest binding affinity score (GLIDE_XP score) and stable molecular interactions. Furthermore, MD simulations confirmed the conformational stability of ligand-protein complexes, supporting their potential as CDK1 inhibitors. This omics-to-in-silico pipeline identifies CDK1 as a robust therapeutic target and highlights natural product-derived inhibitors with favorable pharmacological and physicochemical properties. Therefore, these findings present a viable framework for accelerating precision drug discovery, with experimental validation underway.

Supplementary files

Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article.

View this article’s peer review history

Article information

Article type
Paper
Submitted
27 Jan 2026
Accepted
24 Apr 2026
First published
29 Apr 2026
This article is Open Access
Creative Commons BY license

Digital Discovery, 2025, Accepted Manuscript

Identification of Multi-Transcriptomic Prognostic Biomarkers to Explore Natural Therapeutics for Lung Cancer integrating Machine Learning

M. A. Ali, H. Sarker, M. Kamrun, H. Sheikh, B. Shifa, S. Ahmed, T. Islam, S. Banik and N. Kumar, Digital Discovery, 2025, Accepted Manuscript , DOI: 10.1039/D6DD00045B

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements