Issue 6, 2014

The prediction of methylation states in human DNA sequences based on hexanucleotide composition and feature selection

Abstract

DNA methylation is an important epigenetic modification, and it plays a crucial role in the regulation of gene expression and the occurrence of cancer. Although various experimental methods have been used to detect DNA methylation states, they are time-consuming and laborious. With the rapid accumulation of DNA sequence data, the gap between the number of known sequences and the number of known methylation annotation is widening rapidly. Therefore, it is indispensable to develop a computational method for predicting methylation states. In this study, the hexanucleotide composition is utilized to characterize the DNA sequences. Maximum relevance minimum redundancy is adopted to preselect a feature subset with discrimination information, and an improved genetic algorithm is employed to obtain the optimal feature subset from the preselected feature subset and the parameters of the support vector machine. In the end, a model on the basis of the optimal feature subset and parameter is constructed and used to predict methylation states. Based on the 5-fold cross-validation, the proposed method achieves an accuracy of 92.42%, a Matthew's correlation coefficient of 0.8484 and the area under the receiver operating characteristic curve of 0.9326. The predictive performance of the hexanucleotide composition is evaluated by comparing with trinucleotide composition and nonanucleotide composition. The results indicate that the current method has a high potential to become a useful tool in DNA methylation states prediction research. The source code of Matlab is freely available on request from the authors.

Graphical abstract: The prediction of methylation states in human DNA sequences based on hexanucleotide composition and feature selection

Supplementary files

Article information

Article type
Paper
Submitted
06 Nov 2013
Accepted
17 Dec 2013
First published
18 Dec 2013

Anal. Methods, 2014,6, 1897-1904

The prediction of methylation states in human DNA sequences based on hexanucleotide composition and feature selection

Z. Li, L. Chen, Y. Lai, Z. Dai and X. Zou, Anal. Methods, 2014, 6, 1897 DOI: 10.1039/C3AY41962B

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements