Combining pseudo dinucleotide composition with the Z curve method to improve the accuracy of predicting DNA elements: a case study in recombination spots

Chuan Dong; Ya-Zhou Yuan; Fa-Zhan Zhang; Hong-Li Hua; Yuan-Nong Ye; Abraham Alemayehu Labena; Hao Lin; Wei Chen; Feng-Biao Guo

doi:10.1039/C6MB00374E

Combining pseudo dinucleotide composition with the Z curve method to improve the accuracy of predicting DNA elements: a case study in recombination spots†

Chuan Dong,‡^abc Ya-Zhou Yuan,‡^abc Fa-Zhan Zhang,^abc Hong-Li Hua,^abc Yuan-Nong Ye,^d Abraham Alemayehu Labena,^abc Hao Lin,^abc Wei Chen^e and Feng-Biao Guo*^abc

Author affiliations

* Corresponding authors

^a Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
E-mail: fbguo@uestc.edu.cn
Tel: +86-28-83202351

^b Center of Information in Biomedicine, University of Electronic Science and Technology of China, Chengdu, China

^c Key Laboratory for Neuro-information of the Ministry of Education, University of Electronic Science and Technology of China, Chengdu, China

^d School of Biology and Engineering, Guizhou Medical University, Guiyang, China

^e Department of Physics, School of Sciences, Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan, China

Abstract

Pseudo dinucleotide composition (PseDNC) and Z curve showed excellent performance in the classification issues of nucleotide sequences in bioinformatics. Inspired by the principle of Z curve theory, we improved PseDNC to give the phase-specific PseDNC (psPseDNC). In this study, we used the prediction of recombination spots as a case to illustrate the capability of psPseDNC and also PseDNC fused with Z curve theory based on a novel machine learning method named large margin distribution machine (LDM). We verified that combining the two widely used approaches could generate better performance compared to only using PseDNC with a support vector machine based (SVM-based) model. The best Mathew's correlation coefficient (MCC) achieved by our LDM-based model was 0.7037 through the rigorous jackknife test and improved by ∼6.6%, ∼3.2%, and ∼2.4% compared with three previous studies. Similarly, the accuracy was improved by 3.2% compared with our previous iRSpot-PseDNC web server through an independent data test. These results demonstrate that the joint use of PseDNC and Z curve enhances performance and can extract more information from a biological sequence. To facilitate research in this area, we constructed a user-friendly web server for predicting hot/cold spots, HcsPredictor, which can be freely accessed from http://cefg.cn/HcsPredictor. In summary, we provided a united algorithm by integrating Z curve with PseDNC. We hope this united algorithm could be extended to other classification issues in DNA elements.

Supplementary files

Article information

DOI: https://doi.org/10.1039/C6MB00374E
Article type: Paper
Submitted: 13 May 2016
Accepted: 01 Jul 2016
First published: 01 Jul 2016

Download Citation

Mol. BioSyst., 2016,12, 2893-2900

Author version available

Download author version (PDF)

Permissions

Request permissions

Combining pseudo dinucleotide composition with the Z curve method to improve the accuracy of predicting DNA elements: a case study in recombination spots

C. Dong, Y. Yuan, F. Zhang, H. Hua, Y. Ye, A. A. Labena, H. Lin, W. Chen and F. Guo, Mol. BioSyst., 2016, 12, 2893 DOI: 10.1039/C6MB00374E

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Molecular BioSystems

Combining pseudo dinucleotide composition with the Z curve method to improve the accuracy of predicting DNA elements: a case study in recombination spots†

Abstract

Supplementary files

Article information

Download Citation

Author version available

Permissions

Combining pseudo dinucleotide composition with the Z curve method to improve the accuracy of predicting DNA elements: a case study in recombination spots

Search articles by author

Spotlight

Advertisements