An integrative computational model for large-scale identification of metalloproteins in microbial genomes: a focus on iron–sulfur cluster proteins

Johan Estellon; Sandrine Ollagnier de Choudens; Myriam Smadja; Marc Fontecave; Yves Vandenbrouck

doi:10.1039/C4MT00156G

An integrative computational model for large-scale identification of metalloproteins in microbial genomes: a focus on iron–sulfur cluster proteins†

Johan Estellon,‡^abc Sandrine Ollagnier de Choudens,‡^def Myriam Smadja,^ghi Marc Fontecave^ghi and Yves Vandenbrouck*^abc

Author affiliations

* Corresponding authors

^a Univ. Grenoble Alpes, iRTSV-BGE, F-38000 Grenoble, France
E-mail: johan.estellon@gmail.com

^b CEA, iRTSV-BGE, F-38000 Grenoble, France
E-mail: yves.vandenbrouck@cea.fr
Fax: +33 4 38 78 50 32
Tel: +33 4 38 78 26 74

^c INSERM, BGE, F-38000 Grenoble, France

^d Univ. Grenoble Alpes, iRTSV-LCBM, F-38000 Grenoble, France
E-mail: sandrine.ollagnier@cea.fr

^e CNRS, IRTSV-LCBM, F-38000 Grenoble, France

^f CEA, iRTSV-LCBM, F-38000 Grenoble, France

^g CNRS, UMR8229, 74231 Paris Cedex 05, France
E-mail: marc.fontecave@cea.fr

^h Collège de France, 11 place Marcelin Berthelot, 75231 Paris Cedex 05, France
E-mail: myriam.smadja@college-de-france.fr

ⁱ Université Pierre et Marie Curie, Paris, France

Abstract

Metalloproteins represent a ubiquitous group of molecules which are crucial to the survival of all living organisms. While several metal-binding motifs have been defined, it remains challenging to confidently identify metalloproteins from primary protein sequences using computational approaches alone. Here, we describe a comprehensive strategy based on a machine learning approach to design and assess a penalized generalized linear model. We used this strategy to detect members of the iron–sulfur cluster protein family. A new category of descriptors, whose profile is based on profile hidden Markov models, encoding structural information was combined with public descriptors into a linear model. The model was trained and tested on distinct datasets composed of well-characterized iron–sulfur protein sequences, and the resulting model provided higher sensitivity compared to a motif-based approach, while maintaining a good level of specificity. Analysis of this linear model allows us to detect and quantify the contribution of each descriptor, providing us with a better understanding of this complex protein family along with valuable indications for further experimental characterization. Two newly-identified proteins, YhcC and YdiJ, were functionally validated as genuine iron–sulfur proteins, confirming the prediction. The computational model was then applied to over 550 prokaryotic genomes to screen for iron–sulfur proteomes; the results are publicly available at: http://biodev.extra.cea.fr/isph. This study represents a proof-of-concept for the application of a penalized linear model to identify metalloprotein superfamilies on a large-scale. The application employed here, screening for iron–sulfur proteomes, provides new candidates for further biochemical and structural analysis as well as new resources for an extensive exploration of iron-sulfuromes in the microbial world.

Metallomics

An integrative computational model for large-scale identification of metalloproteins in microbial genomes: a focus on iron–sulfur cluster proteins†

Abstract

Supplementary files

Article information

Download Citation

Search articles by author

Spotlight

Advertisements