Non-destructive origin identification of millet based on the combination of NIRS and improved WOA-based feature wavelength selection
Abstract
To achieve non-destructive identification of millet origin, near-infrared spectroscopy was used to collect the raw spectral data of millet. Considering the issues of high-dimensional redundancy and spectral peak overlap in near-infrared spectral data, feature wavelengths were selected using the Competitive Adaptive Reweighted Sampling (CARS) algorithm, the Uninformative Variable Elimination (UVE) algorithm, and the Whale Optimization Algorithm (WOA), resulting in 26, 158, and 123 feature wavelengths, respectively. To further improve feature extraction effectiveness, strategies such as chaotic mapping were integrated into the Whale Optimization Algorithm (IWOA), reducing the selected feature wavelengths from 123 to 27 variables. Meanwhile, to improve model accuracy, the Crown Pig Optimization (CPO) algorithm was combined with the Least Squares Support Vector Machine (LSSVM) to construct the CPO-LSSVM model for millet origin identification. Experimental results showed that, after wavelength selection, both the LSSVM model and the CPO-LSSVM model exhibited better identification performance than the full-spectrum models. Among them, the model based on IWOA feature wavelength selection combined with CPO-LSSVM performed the best, achieving an accuracy of 99.03%, with precision, recall, and F1 score all reaching 99.20%. Compared with the full-spectrum LSSVM model, these metrics improved by 21.67%, 19.86%, 21.88%, and 20.87%, respectively. In addition, the effectiveness of the proposed IWOA feature wavelength selection method and CPO-LSSVM model was validated on public datasets. The research results demonstrate that the IWOA algorithm, while selecting an effective number of wavelengths, also improves the model’s performance. The CPO-LSSVM model can rapidly and accurately identify the origin information of millet, achieving precise traceability of the millet’s provenance, and at the same time provides a new reference for the origin identification of other agricultural products.