The prediction of methylation states in human DNA sequences based on hexanucleotide composition and feature selection

Zhanchao Li; Lili Chen; Yanhua Lai; Zong Dai; Xiaoyong Zou

doi:10.1039/C3AY41962B

The prediction of methylation states in human DNA sequences based on hexanucleotide composition and feature selection†

Zhanchao Li,^b Lili Chen,^a Yanhua Lai,^a Zong Dai^a and Xiaoyong Zou*^a

* Corresponding authors

^a School of Chemistry and Chemical Engineering, Sun Yat-Sen University, Guangzhou 510275, P.R. China
E-mail: ceszxy@mail.sysu.edu.cn
Fax: +86 20 84112245
Tel: +86 20 84114919

^b School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou 510006, P.R. China

Abstract

DNA methylation is an important epigenetic modification, and it plays a crucial role in the regulation of gene expression and the occurrence of cancer. Although various experimental methods have been used to detect DNA methylation states, they are time-consuming and laborious. With the rapid accumulation of DNA sequence data, the gap between the number of known sequences and the number of known methylation annotation is widening rapidly. Therefore, it is indispensable to develop a computational method for predicting methylation states. In this study, the hexanucleotide composition is utilized to characterize the DNA sequences. Maximum relevance minimum redundancy is adopted to preselect a feature subset with discrimination information, and an improved genetic algorithm is employed to obtain the optimal feature subset from the preselected feature subset and the parameters of the support vector machine. In the end, a model on the basis of the optimal feature subset and parameter is constructed and used to predict methylation states. Based on the 5-fold cross-validation, the proposed method achieves an accuracy of 92.42%, a Matthew's correlation coefficient of 0.8484 and the area under the receiver operating characteristic curve of 0.9326. The predictive performance of the hexanucleotide composition is evaluated by comparing with trinucleotide composition and nonanucleotide composition. The results indicate that the current method has a high potential to become a useful tool in DNA methylation states prediction research. The source code of Matlab is freely available on request from the authors.

Supplementary files

Article information

DOI: https://doi.org/10.1039/C3AY41962B
Article type: Paper
Submitted: 06 Nov 2013
Accepted: 17 Dec 2013
First published: 18 Dec 2013

Download Citation

Anal. Methods, 2014,6, 1897-1904

Permissions

Request permissions

The prediction of methylation states in human DNA sequences based on hexanucleotide composition and feature selection

Z. Li, L. Chen, Y. Lai, Z. Dai and X. Zou, Anal. Methods, 2014, 6, 1897 DOI: 10.1039/C3AY41962B

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Analytical Methods

The prediction of methylation states in human DNA sequences based on hexanucleotide composition and feature selection†

Abstract

Supplementary files

Article information

Download Citation

Permissions

The prediction of methylation states in human DNA sequences based on hexanucleotide composition and feature selection

Social activity

Search articles by author

Spotlight

Advertisements