Unravelling cyclic peptide membrane permeability prediction: a study on data augmentation, architecture choices, and representation schemes
Abstract
Cyclic peptides have emerged as promising candidates for drug development due to their unique structural properties and potential therapeutic benefits. However, clinical applications are limited by their low membrane permeability, which is difficult to predict. This study explores the impact of data augmentation and the inclusion of cyclic structure information in ML modeling to enhance the prediction of membrane permeability of cyclic peptides from their amino acid sequence. Various peptide representation strategies in combination with data augmentation techniques based on amino acid mutations and cyclic permutations were investigated to address the limited availability of experimental data. Moreover, cyclic convolutional layers were explored to explicitly model the cyclic nature of the peptides. The results indicated that combining sequential and peptide properties demonstrated superior performance across multiple metrics. The model performance is highly sensitive to the number and degree of similarity of amino acids involved in mutations. Cyclic permutations improved model performance, particularly in a larger and more diverse dataset and standard architectures captured most of the relevant cyclic information. Highlighting the complexity of peptide-membrane interactions, these results lay a foundation for future improvements in computational methods for the design of cyclic peptide drugs and offer practical guidelines for researchers in this field. The best-performing model was integrated into a user-friendly web-based tool, CYCLOPS: CYCLOpeptide Permeability Simulator (available at http://cyclopep.com/cyclops), to facilitate wider accessibility and application in drug discovery community. This tool allows for rapid predictions of the membrane permeability for cyclic peptides with a classification accuracy score of 0.824 and a regression mean absolute error of 0.477.