Ensemble learning prediction of protein–protein interactions using proteins functional annotations

Indrajit Saha; Julian Zubek; Tomas Klingström; Simon Forsberg; Johan Wikander; Marcin Kierczak; Ujjwal Maulik; Dariusz Plewczynski

doi:10.1039/C3MB70486F

Ensemble learning prediction of protein–protein interactions using proteins functional annotations†

Indrajit Saha,‡*^ab Julian Zubek,‡^c Tomas Klingström,^d Simon Forsberg,^e Johan Wikander,^f Marcin Kierczak,^e Ujjwal Maulik^b and Dariusz Plewczynski*^agh

Author affiliations

* Corresponding authors

^a Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, Warsaw, Poland
E-mail: indra@icm.edu.pl, darman@icm.edu.pl

^b Department of Computer Science and Engineering, Jadavpur University, Kolkata, West Bengal, India

^c Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland

^d Department of Animal Breeding and Genetics, SLU Global Bioinformatics Centre, Swedish University of Agricultural Sciences, Uppsala, Sweden

^e Department of Clinical Sciences, Computational Genetics Section, Swedish University of Agricultural Sciences, Uppsala, Sweden

^f Bioinformatics Program, Faculty of Technology and Natural Sciences, Uppsala University, Sweden

^g The Jackson Laboratory for Genomic Medicine, c/o University of Connecticut Health Center, Administrative Services Building – Call Box 901, 263 Farmington Avenue, Farmington, USA

^h Yale University, New Haven, CT, USA

Abstract

Protein–protein interactions are important for the majority of biological processes. A significant number of computational methods have been developed to predict protein–protein interactions using protein sequence, structural and genomic data. Vast experimental data is publicly available on the Internet, but it is scattered across numerous databases. This fact motivated us to create and evaluate new high-throughput datasets of interacting proteins. We extracted interaction data from DIP, MINT, BioGRID and IntAct databases. Then we constructed descriptive features for machine learning purposes based on data from Gene Ontology and DOMINE. Thereafter, four well-established machine learning methods: Support Vector Machine, Random Forest, Decision Tree and Naïve Bayes, were used on these datasets to build an Ensemble Learning method based on majority voting. In cross-validation experiment, sensitivity exceeded 80% and classification/prediction accuracy reached 90% for the Ensemble Learning method. We extended the experiment to a bigger and more realistic dataset maintaining sensitivity over 70%. These results confirmed that our datasets are suitable for performing PPI prediction and Ensemble Learning method is well suited for this task. Both the processed PPI datasets and the software are available at http://sysbio.icm.edu.pl/indra/EL-PPI/home.html.

Supplementary files

Article information

DOI: https://doi.org/10.1039/C3MB70486F
Article type: Paper
Submitted: 01 Nov 2013
Accepted: 13 Jan 2014
First published: 13 Jan 2014

Download Citation

Mol. BioSyst., 2014,10, 820-830

Permissions

Request permissions

Ensemble learning prediction of protein–protein interactions using proteins functional annotations

I. Saha, J. Zubek, T. Klingström, S. Forsberg, J. Wikander, M. Kierczak, U. Maulik and D. Plewczynski, Mol. BioSyst., 2014, 10, 820 DOI: 10.1039/C3MB70486F

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Molecular BioSystems

Ensemble learning prediction of protein–protein interactions using proteins functional annotations†

Abstract

Supplementary files

Article information

Download Citation

Permissions

Ensemble learning prediction of protein–protein interactions using proteins functional annotations

Search articles by author

Spotlight

Advertisements