Transmembrane region prediction by using sequence-derived features and machine learning methods†
Abstract
Membrane proteins are central to carrying out impressive biological functions. In general, accurate knowledge of transmembrane (TM) regions facilitates ab initio folding and functional annotations of membrane proteins. Therefore, large-scale locating of TM regions in membrane proteins by wet experiments is needed; however, it is hampered by practical difficulties. In this context, in silico methods for TM prediction are highly desired. Here, we present a TM region prediction method using machine learning algorithms and sequence evolutionary profiles. Hydrophobic properties were also assessed. Furthermore, a combined method using sequence evolutionary profiles and hydrophobicity measures was tested. The model was intensively trained on large datasets by means of neural network and random forest learning algorithms for TM region prediction. The proposed method can be directly applied to identify membrane proteins from proteome-wide sequences. Benchmark results suggest that our method is an attractive alternative to membrane protein prediction for real-world applications. The web server and stand-alone program of the proposed method are publicly available at http://genomics.fzu.edu.cn/nnme/index.html.