A machine learning-based strategy for screening bioactive compounds in natural products: a case study on Hypericum perforatum L
Abstract
As a medicinal plant, Hypericum perforatum L. (HPL) is characterized by an abundant material basis, with multiple components jointly exerting biological activity. It is crucial to screen for suitable quality markers based on its specific biological activities for its quality control. This study explores the dose–effect relationship between multiple components in HPL and antioxidant activity, integrating it with machine learning algorithms to construct a virtual screening model for natural antioxidants. High-resolution mass spectrometry was used to collect high-precision semi-quantitative data of HPL, and the in vitro antioxidant activity of the sample was determined. Taking the semi-quantitative data as the X value and the in vitro antioxidant activity data as the Y value, nine independent machine learning models and two ensemble learning models were established, respectively. Based on feature importance scores across all models and combined with the dose–effect analysis, key antioxidant active components were identified. Subsequently, molecular docking and molecular dynamics simulations of endogenous antioxidant mechanisms were conducted for the screened high-characteristic components. The machine learning model established in this study can accurately predict the antioxidant activity of HPL samples. Based on the Bagging integrated learning strategy, the multilayer perceptron regression (MLPR) model showed the best performance, with the training set coefficient of determination (R2) reaching 0.9688, and the prediction set R2 being 0.8761. The root mean square error (RMSEp) and the mean absolute error (MAEp) of the prediction set were 4.27% and 3.47%, respectively, in comparison to the average DPPH scavenging activity of the prediction set, which was 55.59%. Subsequent molecular docking results confirmed that the screened 26 compounds have good in vitro antioxidant activity. For the screened 26 potential bioactive substances, the Keap1/Nrf2/ARE pathway was selected to validate their potential endogenous antioxidant mechanism. Molecular docking and molecular dynamics simulations showed that hyperoside, isohyperoside, kaempferol-3-O-rutinoside, ligustroside, and rutin have excellent binding ability with the Keap1 protein. This study developed a HPL natural antioxidant component screening strategy based on machine learning and non-targeted metabolomics, which can provide new insights for the discovery of key natural products in medicinal plants and drug development.

Please wait while we load your content...