Issue 48, 2023, Issue in Progress

Integration of machine learning in 3D-QSAR CoMSIA models for the identification of lipid antioxidant peptides

Abstract

The comparative molecular similarity indices analysis (CoMSIA) method is a widely used 3D-quantitative structure–activity relationship (QSAR) approach in the field of medicinal chemistry and drug design. However, relying solely on the Partial Least Square algorithm to build models using numerous CoMSIA indices has, in some cases, led to statistically underperforming models. This issue has also affected 3D-CoMSIA models constructed for the ferric thiocyanate (FTC) dataset from linoleic antioxidant measurements. In this study, a novel modeling routine has been developed incorporating various machine learning (ML) techniques to explore different options for feature selection, model fitting, and tuning algorithms with the ultimate goal of arriving at optimal 3D-CoMSIA models with high predictivity for the FTC activity. Recursive Feature Selection and SelectFromModel techniques were applied for feature selection, resulting in a significant improvement in model fitting and predictivity (R2, RCV2, and R2_test) of 24 estimators. However, these selection methods did not fully address the problem of overfitting and, in some instances, even exacerbated it. On the other hand, hyperparameter tuning for tree-based models resulted in dissimilar levels of model generalization for four tree-based models. GB-RFE coupled with GBR (hyperparameters: learning_rate = 0.01, max_depth = 2, n_estimators = 500, subsample = 0.5) was the only combination that effectively mitigated overfitting and demonstrated superior performance (RCV2 of 0.690, R2_test of 0.759, and R2 of 0.872) compared to the best linear model, PLS (with RCV2 of 0.653, R2_test of 0.575, and R2 of 0.755). Therefore, it was subsequently utilized to screen potential antioxidants among a range of Tryptophyllin L tripeptide fragments, leading to the synthesis and testing of three peptides: F-P-5Htp, F-P-W, and P-5Htp-L. These peptides exhibited promising activity levels, with FTC values of 4.2 ± 0.12, 4.4 ± 0.11, and 1.72 ± 0.15, respectively.

Graphical abstract: Integration of machine learning in 3D-QSAR CoMSIA models for the identification of lipid antioxidant peptides

Supplementary files

Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article.

View this article’s peer review history

Article information

Article type
Paper
Submitted
02 Oct 2023
Accepted
31 Oct 2023
First published
17 Nov 2023
This article is Open Access
Creative Commons BY-NC license

RSC Adv., 2023,13, 33707-33720

Integration of machine learning in 3D-QSAR CoMSIA models for the identification of lipid antioxidant peptides

T. T. Nha Tran, T. D. Thuan Tran and T. T. Thuy Bui, RSC Adv., 2023, 13, 33707 DOI: 10.1039/D3RA06690H

This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence. You can use material from this article in other publications, without requesting further permission from the RSC, provided that the correct acknowledgement is given and it is not used for commercial purposes.

To request permission to reproduce material from this article in a commercial publication, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party commercial publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements