Construction of prediction models for phenolic compounds in Cabernet Sauvignon grapes based on visible/near-infrared spectroscopy
Abstract
Focusing on the contents of phenolic compounds such as tannins and anthocyanins, this study aims to construct prediction models for phenolic compound concentrations in wine grapes Cabernet Sauvignon and to identify key characteristic wavelengths associated with these contents. Diffuse reflectance spectra of wine grapes Cabernet Sauvignon were collected using a portable fiber-optic spectrometer. Principal component analysis (PCA) was employed to eliminate outliers, and the SPXY algorithm was applied to divide the dataset into a calibration set (n = 145) and a prediction set (m = 49). Various preprocessing methods and their combinations—including Savitzky–Golay convolution smoothing (SG), multiplicative scatter correction (MSC), standard normal variate (SNV), and standardization (SS)—were compared to determine the optimal preprocessing strategy for different phenolic compounds, yielding the best preprocessed spectral data. Subsequently, characteristic wavelengths were extracted using competitive adaptive reweighted sampling (CARS), successive projections algorithm (SPA), and uninformative variable elimination (UVE). Through comparative analysis, the most effective wavelength selection method and the optimal number of characteristic wavelengths were identified, and partial least squares regression (PLSR) models for tannins and anthocyanins contents were established. The results demonstrated that for tannins, the model combining SG–SNV–SS preprocessing with the CARS algorithm achieved the best performance (Rc2 = 0.9964, Rp2 = 0.9939, RPD = 3.7653), with seven optimal characteristic wavelengths identified at 422.64 nm, 828.86 nm, 948.92 nm, 993.22 nm, 1003.17 nm, 1122.10 nm, and 1122.94 nm. For anthocyanins, the model based on raw spectral data combined with the CARS algorithm yielded the best results (Rc2 = 0.9899, Rp2 = 0.9768, RPD = 6.5591), with six optimal characteristic wavelengths identified at 440.35 nm, 580.76 nm, 632.38 nm, 777.21 nm, 898.61 nm, and 1013.96 nm. The constructed models effectively screened key characteristic wavelengths associated with tannins and anthocyanins contents, enabling accurate prediction of phenolic compounds in wine grapes. This research provides a solid theoretical basis and technical support for the development of portable instruments and the selection of light source devices.

Please wait while we load your content...