Issue 3, 2019

Prediction of zinc-binding sites using multiple sequence profiles and machine learning methods

Abstract

The zinc (Zn2+) cofactor has been proven to be involved in numerous biological mechanisms and the zinc-binding site is recognized as one of the most important post-translation modifications in proteins. Therefore, accurate knowledge of zinc ions in protein structures can provide potential clues for elucidation of protein folding and functions. However, determining zinc-binding residues by experimental means is usually lab-intensive and associated with high cost in most cases. In this context, the development of computational tools for identifying zinc-binding sites is highly desired, especially in the current post-genomic era. In this work, we developed a novel zinc-binding site prediction method by combining several intensively-trained machine learning models. To establish an accurate and generative method, we downloaded all zinc-binding proteins from the Protein Data Bank and prepared a non-redundant dataset. Meanwhile, a well-prepared dataset by other groups was also used. Then, effective and complementary features were extracted from sequences and three-dimensional structures of these proteins. Moreover, several well-designed machine learning models were intensively trained to construct accurate models. To assess the performance, the obtained predictors were stringently benchmarked using the diverse zinc-binding sites. Furthermore, several state-of-the-art in silico methods developed specifically for zinc-binding sites were also evaluated and compared. The results confirmed that our method is very competitive in real world applications and could become a complementary tool to wet lab experiments. To facilitate research in the community, a web server and stand-alone program implementing our method were constructed and are publicly available at http://bioinformatics.fzu.edu.cn/znMachine.html. The downloadable program of our method can be easily used for the high-throughput screening of potential zinc-binding sites across proteomes.

Graphical abstract: Prediction of zinc-binding sites using multiple sequence profiles and machine learning methods

Supplementary files

Article information

Article type
Research Article
Submitted
12 Mar 2019
Accepted
15 Apr 2019
First published
16 Apr 2019

Mol. Omics, 2019,15, 205-215

Prediction of zinc-binding sites using multiple sequence profiles and machine learning methods

R. Yan, X. Wang, Y. Tian, J. Xu, X. Xu and J. Lin, Mol. Omics, 2019, 15, 205 DOI: 10.1039/C9MO00043G

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements