Machine learning models for predicting configuration of modified knuckle epitope peptides of BMP-2 protein using mesoscale simulation data†
Abstract
The high doses of bone morphogenetic proteins (BMPs) cause undesired side effects in skeletal tissue regeneration. An alternative approach is to use the bioactive knuckle epitope domain of BMP-2 (BMP2-KEP) with an open-arm structure as part of the protein for engineering skeletal tissues. However, the osteogenic activity of this peptide, in the free state, is orders of magnitude lower than the native protein which is attributed to the closed-arm structure of the free peptide. The objective of this work was to develop a quantitative structure activity relationship (QSAR) using different machine learning (ML) models to correlate the different 20-mer sequences of the modified BMP2-KEP to their configurational properties. As the existing structure–property data for osteogenic peptides are insufficient for training ML models, the SIMFIM mesoscale simulation model was used to obtain structural properties, such as radius of gyration (Rg) and end-to-end distance (EtE), of the modified BMP2-KEP sequences to create a database. For ML modeling, the residues in the 20-mer sequences, as the input features of the database, were represented by different amino acid descriptor (AAD) scales. The performances of all the models were compared using the R2 performance metric. Permutation importance and SHAP interaction analysis were done to determine which residue positions and properties had highest contribution to the structural properties of the sequences. These studies led to developing trained and tested QSARs for predicting the structural properties of any modified BMP2-KEP sequence for the purpose of discovering novel 20-mer sequences with open-arm structures.