LRSK: a low-rank self-representation K-means method for clustering single-cell RNA-sequencing data

Ye-Sen Sun; Le Ou-Yang; Dao-Qing Dai

doi:10.1039/D0MO00034E

LRSK: a low-rank self-representation K-means method for clustering single-cell RNA-sequencing data†

Ye-Sen Sun,

^a Le Ou-Yang^b and Dao-Qing Dai

*^a

Author affiliations

* Corresponding authors

^a Intelligent Data Center, School of Mathematics, Sun Yat-sen University, Guangzhou, China
E-mail: sunys@mail2.sysu.edu.cn, stsddq@mail.sysu.edu.cn

^b Shenzhen Key Laboratory of Media Security, College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
E-mail: leouyang@szu.edu.cn

Abstract

The development of single-cell RNA-sequencing (scRNA-seq) technologies brings tremendous opportunities for quantitative research and analyses at the cellular level. In particular, as a crucial task of scRNA-seq analysis, single cell clustering shines a light on natural groupings of cells to give new insights into the biological mechanisms and disease studies. However, it remains a challenge to identify cell clusters from lots of cell mixtures effectively and accurately. In this paper, we propose a novel adaptive joint clustering framework, named the low-rank self-representation K-means method (LRSK), to learn the data representation matrix and cluster indicator matrix jointly from scRNA-seq data. Specifically, instead of calculating the similarities among cells from the original data, we seek a low-rank representation of the original data to better reflect the underlying relationships among cells. Moreover, an Augmented Lagrangian Multiplier (ALM) based optimization algorithm is adopted to solve this problem. Experimental results on various scRNA-seq datasets and case studies demonstrate that our method performs better than other state-of-the-art single cell clustering algorithms. The analysis of unlabeled large single-cell liver cancer sequencing data further shows that our prediction results are more reasonable and interpretable.

Molecular Omics

LRSK: a low-rank self-representation K-means method for clustering single-cell RNA-sequencing data†

Abstract

Supplementary files

Article information

Download Citation

Search articles by author

Spotlight

Advertisements