Jump to main content
Jump to site search


Identification and analysis of the cleavage site in a signal peptide using SMOTE, dagging, and feature selection methods

Author affiliations

Abstract

The cleavage site of a signal peptide located in the C-region can be recognized by the signal peptidase in eukaryotic and prokaryotic cells, and the signal peptides are typically cleaved off during or after the translocation of the target protein. The identification of cleavage sites remains challenging because of the diverse lengths of signal peptides and the weak conservation of the motif recognized by the signal peptidase. In this study, we applied a fast and accurate computational method to identify cleavage sites in signal peptides based on protein sequences. We collected 2683 protein sequences with experimentally validated N-terminus signal peptides from the newly released UniProt database. A 20 amino acid-length peptide segment flanking the cleavage site was extracted from each protein, and four types of features were used to encode the peptide segment. We applied the synthetic minority oversampling technique, maximum relevance minimum redundancy, and incremental feature selection, together with dagging and random forest algorithms, to identify the optimal features that can lead to the optimal identification of the cleavage sites. The optimal dagging and random forest classifiers constructed on the optimal features yielded Youden's indexes of 0.871 and 0.736, respectively. The sensitivity, specificity, and accuracy yielded by the optimal dagging classifier all exceeded 0.9, which demonstrated the high prediction ability of the optimal dagging classifier. These optimal features that resulted from the dagging algorithm, predominantly the position-specific scoring matrix and the amino acid factor, played crucial roles in identifying the cleavage sites by a literature review. The prediction method proposed in this study was confirmed to be a powerful tool for recognizing cleavage sites from protein sequences.

Graphical abstract: Identification and analysis of the cleavage site in a signal peptide using SMOTE, dagging, and feature selection methods

Back to tab navigation

Supplementary files

Publication details

The article was received on 19 Sep 2017, accepted on 04 Dec 2017 and first published on 20 Dec 2017


Article type: Research Article
DOI: 10.1039/C7MO00030H
Citation: Mol. Omics, 2018, Advance Article
  •   Request permissions

    Identification and analysis of the cleavage site in a signal peptide using SMOTE, dagging, and feature selection methods

    S. Wang, D. Wang, J. Li, T. Huang and Y. Cai, Mol. Omics, 2018, Advance Article , DOI: 10.1039/C7MO00030H

Search articles by author

Spotlight

Advertisements