Issue 11, 2016

Importance of proximity measures in clustering of cancer and miRNA datasets: proposal of an automated framework

Abstract

Distance plays an important role in the clustering process for allocating data points to different clusters. Several distance or proximity measures have been developed and reported in the literature to determine dissimilarities between two given points. The choice of distance measure depends on a particular domain as well as different data sets of the same domain. It is important to automatically determine the appropriate distance measure which acts best for a particular data set. In this study we have developed an automatic clustering technique using the search capability of multiobjective optimization which can automatically determine the relevant distance measure and the corresponding partitioning from a given data set. Our proposed automated framework is generic in nature i.e., any number of different distance measures can be incorporated into it. In our work we have used four existing widely used distance measures, i.e., Euclidean, line symmetry, point symmetry and city block distance to be explored for each data set. In order to measure the richness of an obtained partitioning using a particular distance, four cluster validity indices, the Silhouette index, the DB index, the adjusted rand index and classification accuracy are used. A new encoding strategy which can encode the set of cluster centers and the particular distance function is used to represent the problem. The appropriate distance function and the corresponding partitioning are determined using the search capability of a multiobjective optimization based technique. The efficiency of the proposed technique is shown on clustering three microRNA and three microarray gene expression data sets having varying complexities. The results show the usefulness of the proposed automated approach.

Graphical abstract: Importance of proximity measures in clustering of cancer and miRNA datasets: proposal of an automated framework

Article information

Article type
Paper
Submitted
23 Aug 2016
Accepted
07 Sep 2016
First published
09 Sep 2016

Mol. BioSyst., 2016,12, 3478-3501

Importance of proximity measures in clustering of cancer and miRNA datasets: proposal of an automated framework

S. Acharya and S. Saha, Mol. BioSyst., 2016, 12, 3478 DOI: 10.1039/C6MB00609D

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Spotlight

Advertisements