Machine learning-enhanced direct mass spectrometry analysis of non-volatile breath metabolites for rapid and accurate lung cancer screening

Abstract

Breath analysis by direct mass spectrometry faces significant challenges due to the inherent complexities in sample collection, low analyte concentrations, and accurate compound identification. While current breath analysis primarily focuses on volatile organic compounds (VOCs) for disease research, non-volatile organic compounds (nVOCs) remain largely unexplored despite their diagnostic potential. Here, we present a novel breath analysis method for lung cancer diagnosis based on nVOCs, integrating non-invasive breath analysis with machine learning algorithms for comprehensive characterization of 98 clinical breath samples. This study leverages a machine learning-driven database docking methodology to overcome the bottleneck of metabolite direct mass spectrometry conventional identification. This approach enables rapid and precise screening of non-volatile differential metabolites while effectively excluding exogenous confounders (e.g., pharmacological or environmental interference), enhancing nVOC detection in breath. The approach identified 29 statistically significant nVOC biomarkers, including fatty acids and amino acids, achieving a 0.9878 prediction accuracy for lung cancer detection. For distinguishing between NSCLC and SCLC, the area under the curve (AUC) value can reach 0.9, and the out-of-bag error of random forest is 0.00402. Notably, specific nVOCs including fatty acids and amino acids have high diagnostic potential, with an AUC of up to 0.67 of individual metabolites for the differentiation of SCLC from NSCLC. Finally, significantly altered metabolic pathways were explored by metabolite pathway and transcriptome analysis, showing that the fatty acid metabolism is a potentially regulatable pathway. Our approach facilitates rapid, non-invasive discrimination of NSCLC and SCLC in metabolic analysis, showing promise as an efficient, low-cost clinical test.

Graphical abstract: Machine learning-enhanced direct mass spectrometry analysis of non-volatile breath metabolites for rapid and accurate lung cancer screening

Supplementary files

Article information

Article type
Paper
Submitted
07 Aug 2025
Accepted
01 Oct 2025
First published
02 Oct 2025

Anal. Methods, 2025, Advance Article

Machine learning-enhanced direct mass spectrometry analysis of non-volatile breath metabolites for rapid and accurate lung cancer screening

R. Zheng, J. Dong, Y. Zhang, B. Pang, C. Hu and R. Su, Anal. Methods, 2025, Advance Article , DOI: 10.1039/D5AY01304F

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements