Importance of proximity measures in clustering of cancer and miRNA datasets: proposal of an automated framework

Sudipta Acharya; Sriparna Saha

doi:10.1039/C6MB00609D

Importance of proximity measures in clustering of cancer and miRNA datasets: proposal of an automated framework

Sudipta Acharya †*^a and Sriparna Saha †*^a

* Corresponding authors

^a Department of Computer Science and Engineering, Indian Institute of Technology Patna, India
E-mail: sudiptaacharya.2012@gmail.com, sriparna.saha@gmail.com

Abstract

Distance plays an important role in the clustering process for allocating data points to different clusters. Several distance or proximity measures have been developed and reported in the literature to determine dissimilarities between two given points. The choice of distance measure depends on a particular domain as well as different data sets of the same domain. It is important to automatically determine the appropriate distance measure which acts best for a particular data set. In this study we have developed an automatic clustering technique using the search capability of multiobjective optimization which can automatically determine the relevant distance measure and the corresponding partitioning from a given data set. Our proposed automated framework is generic in nature i.e., any number of different distance measures can be incorporated into it. In our work we have used four existing widely used distance measures, i.e., Euclidean, line symmetry, point symmetry and city block distance to be explored for each data set. In order to measure the richness of an obtained partitioning using a particular distance, four cluster validity indices, the Silhouette index, the DB index, the adjusted rand index and classification accuracy are used. A new encoding strategy which can encode the set of cluster centers and the particular distance function is used to represent the problem. The appropriate distance function and the corresponding partitioning are determined using the search capability of a multiobjective optimization based technique. The efficiency of the proposed technique is shown on clustering three microRNA and three microarray gene expression data sets having varying complexities. The results show the usefulness of the proposed automated approach.

Article information

https://doi.org/10.1039/C6MB00609D

Article type

Paper

Submitted

23 Aug 2016

Accepted

07 Sep 2016

First published

09 Sep 2016

Download Citation

Mol. BioSyst., 2016,12, 3478-3501

Permissions

Request permissions

Importance of proximity measures in clustering of cancer and miRNA datasets: proposal of an automated framework

S. Acharya and S. Saha, Mol. BioSyst., 2016, 12, 3478 DOI: 10.1039/C6MB00609D

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Molecular BioSystems

Importance of proximity measures in clustering of cancer and miRNA datasets: proposal of an automated framework

Abstract

Article information

Download Citation

Permissions

Importance of proximity measures in clustering of cancer and miRNA datasets: proposal of an automated framework

Search articles by author

Spotlight

Advertisements