Bagging partial least squares for accurate and stable wheat protein content detection using near-infrared spectroscopy

Abstract

Near-infrared (NIR) spectroscopy combined with machine learning algorithms has been widely adopted for rapid assessment of grain quality attributes. However, conventional calibration models often suffer from overfitting and instability when applied to high-dimensional spectral data with limited sample sizes. In this study, we developed a novel bagging partial least squares (BA-PLS) algorithm for accurate and stable prediction of wheat protein content. A total of 394 wheat samples were collected and their NIR spectra from 950 to 1650 nm were acquired. The BA-PLS algorithm generates multiple bootstrap subsamples, trains PLS models on each subsample, and aggregates their predictions through averaging, effectively reducing prediction variance while preserving the low-bias property of PLS. The performance of BA-PLS was comprehensively compared with standard PLS, support vector regression (SVR), and extreme gradient boosting (XGBoost). The results demonstrated that BA-PLS achieved superior predictive performance with the coefficient of determination (R²P) of 0.9600 and the root mean square error (RMSEP) of 0.3058%. Notably, while SVR and XGBoost exhibited severe overfitting with training to test R² gaps exceeding 0.4045, BA-PLS maintained excellent generalization with a minimal R² gap of 0.0261. Furthermore, BA-PLS provided reliable prediction uncertainty estimates through the standard deviation of ensemble predictions. The proposed BA-PLS algorithm offers a practical and stable solution for rapid wheat protein quantification, with potential applicability to other cereal quality assessment tasks.

Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article.

View this article’s peer review history

Article information

Article type
Paper
Submitted
20 Apr 2026
Accepted
22 May 2026
First published
22 May 2026

Anal. Methods, 2026, Accepted Manuscript

Bagging partial least squares for accurate and stable wheat protein content detection using near-infrared spectroscopy

Q. Ma, L. Zhang, Y. Ju, G. Xie and F. Li, Anal. Methods, 2026, Accepted Manuscript , DOI: 10.1039/D6AY00743K

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements