Issue 53, 2025, Issue in Progress

Three-segment dynamic threshold joint optimization strategy-based mRMR-PCA-LGBM model for origin identification of Cornus officinalis via mid-infrared spectroscopy

Abstract

The origin of Chinese medicinal materials directly determines their efficacy and safety. To address the rapid traceability needs of Cornus officinalis, this study proposes a three-segment dynamic threshold joint optimization framework. Based on 658 samples of Cornus officinalis from 11 different origins, the framework uses the minimum redundancy maximum relevance algorithm to sort the 3448-dimensional mid-infrared spectra, which are then divided into three segments: retention, dimensionality reduction, and deletion. Through Bayesian optimization, the framework jointly determines the retention of 34 key spectral bands, deletion of 345 bands, and hyperparameters of the LightGBM model. The dimensionality reduction segment is compressed to 38 dimensions using principal component analysis, resulting in a final input of 72 features for the mRMR-PCA-LightGBM model. The independent test set achieves an accuracy of 90.9%, F1-score of 0.91, Cohen's kappa of 0.90, and Matthews correlation coefficient of 0.90. The receiver operating characteristic – area under the curve for the 11 origins is greater than 0.95. These results are markedly better than those of five control models. By strategically capturing origin-specific information while eliminating irrelevant noise, this framework demonstrates that highly accurate and robust origin identification is achievable with minimal spectral features, providing a practical and efficient technical pathway for the authentication and market supervision of Chinese medicinal materials.

Graphical abstract: Three-segment dynamic threshold joint optimization strategy-based mRMR-PCA-LGBM model for origin identification of Cornus officinalis via mid-infrared spectroscopy

Supplementary files

Article information

Article type
Paper
Submitted
10 Aug 2025
Accepted
16 Nov 2025
First published
20 Nov 2025
This article is Open Access
Creative Commons BY-NC license

RSC Adv., 2025,15, 45500-45513

Three-segment dynamic threshold joint optimization strategy-based mRMR-PCA-LGBM model for origin identification of Cornus officinalis via mid-infrared spectroscopy

B. Liu, H. Yi, C. Li, W. Yu and S. Yang, RSC Adv., 2025, 15, 45500 DOI: 10.1039/D5RA05862G

This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence. You can use material from this article in other publications, without requesting further permission from the RSC, provided that the correct acknowledgement is given and it is not used for commercial purposes.

To request permission to reproduce material from this article in a commercial publication, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party commercial publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements