Issue 5, 2020

LRSK: a low-rank self-representation K-means method for clustering single-cell RNA-sequencing data

Abstract

The development of single-cell RNA-sequencing (scRNA-seq) technologies brings tremendous opportunities for quantitative research and analyses at the cellular level. In particular, as a crucial task of scRNA-seq analysis, single cell clustering shines a light on natural groupings of cells to give new insights into the biological mechanisms and disease studies. However, it remains a challenge to identify cell clusters from lots of cell mixtures effectively and accurately. In this paper, we propose a novel adaptive joint clustering framework, named the low-rank self-representation K-means method (LRSK), to learn the data representation matrix and cluster indicator matrix jointly from scRNA-seq data. Specifically, instead of calculating the similarities among cells from the original data, we seek a low-rank representation of the original data to better reflect the underlying relationships among cells. Moreover, an Augmented Lagrangian Multiplier (ALM) based optimization algorithm is adopted to solve this problem. Experimental results on various scRNA-seq datasets and case studies demonstrate that our method performs better than other state-of-the-art single cell clustering algorithms. The analysis of unlabeled large single-cell liver cancer sequencing data further shows that our prediction results are more reasonable and interpretable.

Graphical abstract: LRSK: a low-rank self-representation K-means method for clustering single-cell RNA-sequencing data

Supplementary files

Article information

Article type
Research Article
Submitted
21 Mar 2020
Accepted
01 Jun 2020
First published
09 Jun 2020

Mol. Omics, 2020,16, 465-473

Spotlight

Advertisements