Computational characterization of parallel dimeric and trimeric coiled-coils using effective amino acid indices†
The coiled-coil, which consists of two or more α-helices winding around each other, is a ubiquitous and the most frequently observed protein–protein interaction motif in nature. The coiled-coil is known for its straightforward heptad repeat pattern and can be readily recognized based on protein primary sequences, exhibiting a variety of oligomer states and topologies. Due to the stable interaction formed between their α-helices, coiled-coils have been under close scrutiny to design novel protein structures for potential applications in the fields of material science, synthetic biology and medicine. However, their broader application requires an in-depth and systematic analysis of the sequence-to-structure relationship of coiled-coil folding and oligomeric formation. In this article, we propose a new oligomerization state predictor, termed as RFCoil, which exploits the most useful and non-redundant amino acid indices combined with the machine learning algorithm – random forest (RF) – to predict the oligomeric states of coiled-coil regions. Benchmarking experiments show that RFCoil achieves an AUC (area under the ROC curve) of 0.849 on the 10-fold cross-validation test using the training dataset and 0.855 on the independent test using the validation dataset, respectively. Performance comparison results indicate that RFCoil outperforms the four existing predictors LOGICOIL, PrOCoil, SCORER 2.0 and Multicoil2. Furthermore, we extract a number of predominant rules from the trained RF model that underlie the oligomeric formation. We also present two case studies to illustrate the applicability of the extracted rules to the prediction of coiled-coil oligomerization state. The RFCoil web server, source codes and datasets are freely available for academic users at http://protein.cau.edu.cn/RFCoil/.