Biomarker identification by reversing the learning mechanism of an autoencoder and recursive feature elimination

Fuad Al Abir; S. M. Shovan; Md. Al Mehedi Hasan; Abu Sayeed; Jungpil Shin

doi:10.1039/D1MO00467K

Biomarker identification by reversing the learning mechanism of an autoencoder and recursive feature elimination†

Fuad Al Abir,

*^a S. M. Shovan,^ab Md. Al Mehedi Hasan,^ac Abu Sayeed^a and Jungpil Shin^c

Author affiliations

* Corresponding authors

^a Department of Computer Science & Engineering, Rajshahi University of Engineering & Technology, Rajshahi, Bangladesh
E-mail: alabir.fuad@gmail.com, 1603021@student.ruet.ac.bd

^b Department of Computer Science, Missouri University of Science & Technology, MO, USA

^c School of Computer Science and Engineering, University of Aizu, Aizuwakamatsu, Japan

Abstract

RNA-Seq has made significant contributions to various fields, particularly in cancer research. Recent studies on differential gene expression analysis and the discovery of novel cancer biomarkers have extensively used RNA-Seq data. New biomarker identification is essential for moving cancer research forward, and early cancer diagnosis improves patients' chances of recovery and increases life expectancy. There is an urgency and scope of improvement in both sections. In this paper, we developed an autoencoder-based biomarker identification method by reversing the learning mechanism of the trained encoders. We devised an explainable post hoc methodology for identifying influential genes with a high likelihood of becoming biomarkers. We applied recursive feature elimination to shorten the list further and presented a list of 17 potential biomarkers that are 99.93% accurate in identifying cancer types using support vector machine for the UCI gene expression cancer RNA-Seq dataset consisting of five cancerous tumor types. Our methodology outperforms all of the state-of-the-art methods, confirming the potential of the newly identified biomarkers as well as the efficacy of the biomarker identification procedure. Moreover, we have evaluated the performance of our methodology using six independent RNA-Seq gene expression datasets for several tasks, i.e., classification of tumors from non-tumors, detecting the origin of circulating tumor cells (CTCs), and predicting if metastasis occurs or not. Our methodology achieved stimulating results for these tasks as well. The source code of this project is available at https://github.com/fuad021/biomarker-identification.

Molecular Omics

Biomarker identification by reversing the learning mechanism of an autoencoder and recursive feature elimination†

Abstract

Supplementary files

Article information

Download Citation

Search articles by author

Spotlight

Advertisements