Issue 1, 2018

Identification and analysis of the cleavage site in a signal peptide using SMOTE, dagging, and feature selection methods

Abstract

The cleavage site of a signal peptide located in the C-region can be recognized by the signal peptidase in eukaryotic and prokaryotic cells, and the signal peptides are typically cleaved off during or after the translocation of the target protein. The identification of cleavage sites remains challenging because of the diverse lengths of signal peptides and the weak conservation of the motif recognized by the signal peptidase. In this study, we applied a fast and accurate computational method to identify cleavage sites in signal peptides based on protein sequences. We collected 2683 protein sequences with experimentally validated N-terminus signal peptides from the newly released UniProt database. A 20 amino acid-length peptide segment flanking the cleavage site was extracted from each protein, and four types of features were used to encode the peptide segment. We applied the synthetic minority oversampling technique, maximum relevance minimum redundancy, and incremental feature selection, together with dagging and random forest algorithms, to identify the optimal features that can lead to the optimal identification of the cleavage sites. The optimal dagging and random forest classifiers constructed on the optimal features yielded Youden's indexes of 0.871 and 0.736, respectively. The sensitivity, specificity, and accuracy yielded by the optimal dagging classifier all exceeded 0.9, which demonstrated the high prediction ability of the optimal dagging classifier. These optimal features that resulted from the dagging algorithm, predominantly the position-specific scoring matrix and the amino acid factor, played crucial roles in identifying the cleavage sites by a literature review. The prediction method proposed in this study was confirmed to be a powerful tool for recognizing cleavage sites from protein sequences.

Graphical abstract: Identification and analysis of the cleavage site in a signal peptide using SMOTE, dagging, and feature selection methods

Supplementary files

Article information

Article type
Research Article
Submitted
19 Sep 2017
Accepted
04 Dec 2017
First published
20 Dec 2017

Mol. Omics, 2018,14, 64-73

Identification and analysis of the cleavage site in a signal peptide using SMOTE, dagging, and feature selection methods

S. Wang, D. Wang, J. Li, T. Huang and Y. Cai, Mol. Omics, 2018, 14, 64 DOI: 10.1039/C7MO00030H

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements