Issue 8, 2016

MSLVP: prediction of multiple subcellular localization of viral proteins using a support vector machine

Abstract

Knowledge of the subcellular location (SCL) of viral proteins in the host cell is important for understanding their function in depth. Therefore, we have developed “MSLVP”, a two-tier prediction algorithm for predicting multiple SCLs of viral proteins. For this study, data sets of comprehensive viral proteins with experimentally validated SCL annotation were collected from UniProt. Non-redundant (90%) data sets of 3480 viral proteins that belonged to single (2715), double (391) and multiple (374) sites were employed. Additionally, 1687 (30% sequence identity) viral proteins were categorised into single (1366), double (167) and multiple (154) sites. Single, double and multiple locations further comprised of eight, four and six categories, respectively. Viral protein locations include the nucleus, cytoplasm, endoplasmic reticulum, extracellular, single-pass membrane, multi-pass membrane, capsid, remaining others and combinations thereof. Support vector machine based models were developed using sequence features like amino acid composition, dipeptide composition, physicochemical properties and their hybrids. We have employed “one-versus-one” as well as “one-versus-other” strategies for multiclass classification. The performance of “one-versus-one” is better than the “one-versus-other” approach during 10-fold cross-validation. For the 90% data set, we achieved an accuracy, a Matthew's correlation coefficient (MCC) and a receiver operating characteristic (ROC) of 99.99%, 1.00, 1.00; 100.00%, 1.00, 1.00 and 99.90%; 1.00, 1.00 for single, double and multiple locations, respectively. Similar results were achieved for a 30% sequence identity data set. Predictive models for each SCL performed equally well on the independent dataset. The MSLVP web server (http://bioinfo.imtech.res.in/manojk/mslvpred/) can predict subcellular locations i.e. single (8; including single and multi-pass membrane), double (4) and multiple (6). This would be helpful for elucidating the functional annotation of viral proteins and potential drug targets.

Graphical abstract: MSLVP: prediction of multiple subcellular localization of viral proteins using a support vector machine

Supplementary files

Article information

Article type
Paper
Submitted
31 Mar 2016
Accepted
23 May 2016
First published
23 May 2016

Mol. BioSyst., 2016,12, 2572-2586

MSLVP: prediction of multiple subcellular localization of viral proteins using a support vector machine

A. Thakur, A. Rajput and M. Kumar, Mol. BioSyst., 2016, 12, 2572 DOI: 10.1039/C6MB00241B

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Spotlight

Advertisements