A weighted ensemble method based on wavelength selection for near-infrared spectroscopic calibration
Abstract
It is well-known that near infrared spectroscopy not only has many wavelength variables but also suffers serious peak overlaps, which influences predictive results. In order to effectively select the wavelength variables and improve the predictive accuracy, a weighted clustering and pruning of wavelength variables-partial least squares (WCPV-PLS) method was proposed. Firstly, the wavelength variables were clustered, and the value predicted using each cluster was obtained by partial least squares (PLS). The maximum absolute errors and variances of the predicted values relative to the experimental values were calculated. Then, the maximum absolute errors and variances were combined to determine the clusters for pruning, which effectively reduced the number of wavelength variables. A weight coefficient was constructed according to the maximum absolute errors and variances of the remaining clusters. Finally, the training subsets extracted from the remaining clusters were corrected by the weight coefficients, and the predictive results of the sub-model were obtained by PLS. Further integration of the results of multiple sub-models produced the WCPV-PLS method. Compared with four other methods, WCPV-PLS performed well in most cases for protein prediction in maize. That is to say, the correlation coefficients between the experimental values and the predicted values were large and the root-mean-square error (RMSE) was small. The robustness was strong. Further experimental results on moisture and starch in maize showed that the WCPV-PLS method could effectively prune redundant wavelength variables and the results were stable and reliable for multiple iterations. Specifically, the RMSE of the starch content in maize was smaller than 0.17 and that of moisture was even smaller than 0.16.