Molecular spectroscopic wavelength selection using combined interval partial least squares and correlation coefficient optimization
Wavelength selection plays a vital role in employing near-infrared spectroscopy for analyzing samples. Existing wavelength selection algorithms present certain drawbacks that can be mitigated by combining algorithms. In this study, we employed a combination of algorithms to quantitatively analyze corn oil components using near-infrared spectroscopy data. We combined Savitzky-Golay (SG) preprocessing, correlation coefficient (CC) method, and synergy interval partial least squares (siPLS) algorithms to propose CC-SiPLS and CC-SG-SiPLS methods. The results of applying full-spectrum partial least squares (PLS), correlation coefficient partial least squares (CC-PLS), synergy interval partial least squares (SiPLS), CC-SiPLS, and CC-SG-SiPLS method to the near-infrared spectral wavelength selection were compared. The results showed that the mathematical models established from the spectral data after wavelength selection using CC, SiPLS, CC-SiPLS, and CC-SG-SiPLS were simplified, and the numbers of wavelengths were 33.6% (CC) and 14.3% (SiPLS), 11.1% (CC-SiPLS), and 6.3% (CC-SG-SiPLS) of that using the full spectrum. The prediction accuracy for predicting the oil content of corn was improved compared to PLS. CC-SG-SIPLS，wavelength selection algorithm combined with preprocessing method, reduced the number of wavelengths from 700 to 44. And, the model complexity was most simplified. The root mean square error in prediction (RMSEP) and relative percent deviation (RPD) were 0.0552 and 2.5706, respectively, demonstrating adequate prediction accuracy. This result indicates that a combination strategy provides a effective way for multiple waveband selection, and that CC-SG-SiPLS can provide high analysis accuracy using molecular absorption bands composed of several wavelength intervals, and is an effective and robust wavelength selection strategy.