Haoyu
Jin
ab,
Xiaojian
Hao
*ab,
Nan
Li
ab,
Ying
Han
c,
Biming
Mo
ab and
Shuyi
Zhang
c
aScience and Technology on Electronic Test and Measurement Laboratory, North University of China, Taiyuan, Shanxi, China. E-mail: haoxiaojian@nuc.edu.cn
bState Key Laboratory of Dynamic Measurement Technology, North University of China, Taiyuan, Shanxi, China
cShanxi Xinghuacun Fen Wine Factory Co., Ltd., Shanxi, China
First published on 20th May 2024
Aiming to tackle the problems of low classification accuracy, difficult data processing, long analysis time, and the influence of subjective consciousness of traditional Baijiu (Chinese liquor), this paper reports a method based on the combination of laser-induced breakdown spectroscopy with high-frequency ultrasonic atomization system (AS-LIBS) and optimized grid search cross validation (Opt-GSCV) for high-precision and rapid classification of Baijiu of different dimensions. Multi-dimensional high-precision classification of 23 Baijiu samples with different series, degrees, and production dates from Shanxi Xinghuacun Fen Wine Factory Co., Ltd was performed. By studying the growth and evolution mechanism of plasma generated by laser excitation, a new method of Baijiu spectral signal acquisition was realized by combining LIBS with Baijiu detection. Aiming at addressing the shortcomings of high breakdown thresholds and poor signal stability due to the direct breakdown of Baijiu with high ethanol content, a high-frequency atomization system based on ultrasonic technology is proposed to realize the efficient extraction of Baijiu plasma spectral signals. With regard to the selection of the number of nodes, weights and thresholds of the implicit layer of the back-propagation (BP) network, an optimization model based on the combination of the particle swarm optimization (PSO) algorithm combined with the genetic algorithm (GA) is proposed to realize the construction of the BP optimal network architecture. By constructing a secondary optimization seeking network model based on OptGSCV, the technical problem of mismatch between optimal parameter selection and optimal network construction is solved, and a new network integrating an intelligent qualitative analysis algorithm and a traditional classification algorithm is realized. The experimental results show that the proposed OptGSCV quadratic optimization network combined with the AS-LIBS multidimensional Baijiu high-precision classification model not only introduces an innovative and high-efficiency detection method for the Baijiu industry, but also provides important technical support and new ideas for the key areas of food safety production and Baijiu quality control.
At present, researchers have more methods to test and analyze the quality of alcoholic beverages, but due to the lower specific heat capacity of ethanol in Baijiu, so that in the transformation of liquid molecules into gas molecules do not need to absorb too much heat, and ethanol molecules of small molecular weight, the molecular spacing between molecules is large, and therefore can be relatively easy to detach from the surface of the liquid into the gaseous state, and these reasons determines the nature of the ethanol volatility. As the ethanol content of Baijiu is higher than that of other alcoholic beverages (wine, whisky, etc.), it makes Baijiu analysis methods more complex.7–9 At present, Baijiu quality testing methods mainly include sensory oral evaluation rating method,10 inductively coupled plasma mass spectrometry (ICP-MS),11,12 phase liquid chromatography (PLC),13 gas-phase ion mobility spectrometry (GIMS),14 and bionic identification method based on the electronic nose, electronic tongue.15 As one of the more common methods for determining the authenticity of Baijiu in daily production and life, the sensory taste rating method has a high degree of detection flexibility and can directly and quickly reflect the comprehensive quality of Baijiu, including color, smell, taste and other aspects, and can capture the subtle differences in Baijiu. However, the results of this method mainly depend on the experience of the taster, are subjective, and are easily affected by the external environment and the taster's personal preferences and other factors, and it is difficult to quantitatively describe the different levels of the original Baijiu, and its accuracy fluctuates greatly.16 ICP-MS technology can accurately analyze the content of trace elements in the Baijiu, and it has a higher analytical precision, but also has the advantages of fast detection speed and the determination of a variety of elements. It also has the advantages of fast detection speed and determination of multiple elements. But in the detection process there is a need to add a certain amount of fluorescent substances to the Baijiu so that the sample is contaminated. Not only this, the method also requires professional equipment and technical support and the detection cost is higher.17 PLC technology provides fast and accurate analysis of organic substances such as esters, aldehydes, ketones and acids in Baijiu. However, the sensitivity is low, and the ability to analyze complex samples is limited.18 GIMS technology has the advantages of speed and sensitivity and is suitable for the detection of volatile organic compounds. However, for Baijiu, which has large sugar content and is not easy to volatilize, there are limitations in its detection ability.19 Electronic nose and electronic tongue based on some of the bionic identification methods belong to the sensory simulation technology and can quickly identify the varieties and origins of Baijiu and analyze faster. However, the technology used in the sensor is non-specific mainly because the higher degree of similarity of the mixture makes it difficult to distinguish correctly, and the detection cost is higher. Therefore, the above means of detection in practical applications have certain limitations.
Laser induced breakdown spectroscopy (LIBS)20,21 is a plasma atomic emission spectroscopy technique based on high power pulses, which utilizes an intense pulsed laser focused on the surface of the target sample to generate a plasma, realizing the quantitative and qualitative analysis of the sample composition. As an emerging means of chemical analysis, LIBS not only has the advantages of rapid, multi-element detection, but it also allows non-contact in situ detection,22 which opens up new possibilities for real-time on-line detection, and it has been widely used in many fields, such as food safety,23,24 substance detection,25 geological exploration,26 and Mars exploration.27,28 In recent years, most of the chemometric methods for LIBS spectral data analysis have combined advanced machine learning models based on multivariate statistical techniques and various intelligent optimization algorithms.29 Xinmeng Luo et al.30 used a particle swarm optimized (PSO) back-propagation (BP) neural network combined with LIBS data to achieve rapid detection of heavy metal content in Pinus sylvestris. Jie Ren et al.31 optimized the BP network using genetic algorithm (GA) and combined it with PSO model to achieve the detection of soil Cd content under double-pulse LIBS.
The above examples show that advanced machine learning algorithmic modeling of LIBS experimental data can reduce and correct the spectral intensity fluctuations caused by emission source noise and matrix effects, thus improving the measurement accuracy.32 In this regard, this paper designs a multi-dimensional Baijiu high-precision characterization system based on an optimized grid search and cross-validation (OptGSCV) quadratic optimization network combined with LIBS based on high-frequency ultrasonic atomization system (AS-LIBS), which achieves high-precision real-time on-line inspection of the wavelength range of the internal plasma of the Baijiu, the fluid properties, and the compositional constituents.
• By studying the growth and evolution mechanism of plasma generated by laser excitation, LIBS combines the advantages of fast analysis rate, on-line in situ detection, no sample pretreatment, and micro-destructive detection of samples to be tested with traditional Baijiu detection technology and creates a new method for the detection and analysis of trace elements in Baijiu.
• Independent design and development of a high-frequency atomization system based on ultrasonic technology, breaking through the Baijiu due to high ethanol content, and plasma spectroscopy signal acquisition distortion, to achieve the Baijiu characteristics of plasma spectroscopy signal acquisition with high efficiency.
• An intelligent optimization qualitative analysis algorithm with hyperplane segmentation constraints and a quadratic optimization network model based on OptGSCV are innovatively proposed. It realizes the individual optimization of each qualitative analysis index and the overall optimization and solves the technical problem of the mismatch between the selection of optimal parameters and the construction of an optimal network.
• The breakthrough proposes a new method of multi-dimensional high-precision rapid detection and qualitative analysis of Baijiu using a machine learning optimization model combined with AS-LIBS, which realizes high-precision online in situ analysis of Baijiu with multiple dimensions identified at one time.
Fig. 1 Diagram of a multi-dimensional Baijiu authentication system based on a high-frequency ultrasonic atomization system. |
Fenjiu, a typical representative of clear-flavored Baijiu, was chosen as the research object based on the characteristics of the local economy and people's health.34,35 The 23 Fenjiu samples used in the experiment were provided by Shanxi Xinghuacun Fenjiu Factory Co., Ltd.,36 which guaranteed the authority of the samples and the authenticity of the experimental data. The samples were selected to cover all the best-selling products of Fenjiu, including 10 different series, 14 different brands and 7 different degrees. In order to investigate the effects of different production dates on the differences in trace element contents of Fenjiu, 10 different years of red-capped glass Fenjiu from 2013 to 2023 were selected as the representatives of the experiment. Table 1 lists the differences in the series to which the different series belong, the differences in alcohol content, and the differences in the production dates of the different samples.
Belonging series | Brand name | Alcohol content | Production date | Sample |
---|---|---|---|---|
Glass Fenjiu series | Glass Fenjiu with a red lid | 42° | October 20, 2013 | Sample 1 |
Glass Fenjiu with a red lid | 42° | November 10, 2014 | Sample 2 | |
Glass Fenjiu with a red lid | 42° | June 15, 2015 | Sample 3 | |
Glass Fenjiu with a red lid | 42° | August 8, 2016 | Sample 4 | |
Glass Fenjiu with a red lid | 42° | June 16, 2017 | Sample 5 | |
Glass Fenjiu with a red lid | 42° | July 7, 2018 | Sample 6 | |
Glass Fenjiu with a red lid | 42° | June 30, 2020 | Sample 7 | |
Glass Fenjiu with a red lid | 42° | August 28, 2021 | Sample 8 | |
Glass Fenjiu with a red lid | 42° | May 19, 2022 | Sample 9 | |
Glass Fenjiu with a red lid | 42° | March 18, 2023 | Sample 10 | |
Yellow cap glass Fenjiu | 53° | September 28, 2020 | Sample 11 | |
Fenyang King series | Fenyang King 10 | 45° | June 19, 2021 | Sample 12 |
Fenyang King 25 | 50° | October 24, 2022 | Sample 13 | |
Export Fenjiu series | Opalescent glass Fenjiu | 48° | March 4, 2019 | Sample 14 |
Export of ceramic Fenjiu | 53° | September 14, 2018 | Sample 15 | |
Blue and white porcelain Fenjiu series | Blue and white porcelain Fenjiu 20 | 53° | March 17, 2022 | Sample 16 |
Blue and white porcelain Fenjiu 30 | 53° | April 18, 2021 | Sample 17 | |
Bamboo leaf green Fenjiu series | Bamboo leaf green Fenjiu | 38° | May 11, 2020 | Sample 18 |
Rose Fenjiu series | Rose Fenjiu | 40° | August 21, 2019 | Sample 19 |
White jade Fenjiu series | White jade Fenjiu | 40° | December 2, 2019 | Sample 20 |
Panama Fenjiu series | Panama 1915 Black tan Fenjiu | 42° | December 2, 2022 | Sample 21 |
Old white Fenjiu series | Old white Fenjiu 10 | 53° | July 7, 2022 | Sample 22 |
Base liquor | Fenjiu base Baijiu | 65° | January 20, 2020 | Sample 23 |
Experimental samples were taken directly from unopened bottles to avoid the influence of airborne microbial populations on the experimental data. Each LIBS spectrum was obtained from a single test. For 23 different Fenjiu, each sample was sampled six times, and 150 ml of each sample was placed in an ultrasonic atomization system. In order to obtain stable spectral signals, the LIBS detection system was run after a stable gas flow column appeared at the outlet of the spherical carrier gas device for 2 min. The laser pulse energy was set at 50 mJ and the repetition frequency at 10 Hz. The laser was focused through a 300 mm focusing lens to the outlet of the nebulizer to break down the aerosol of the Fenjiu droplets, and the signals were coupled to the spectrometer through the laser probe and the optical fiber (core diameter of 1000 mm and a numerical aperture of 0.22). In order to prevent the effect of toughness radiation, a digital delay generator (Stanford model DG535) was utilized to trigger the detector with a delay of 1 μs between the laser pulse and the collected plasma radiation. 100 spectral datasets were collected for each experiment, and a total of 13800 (23 × 6 × 100) 7846-dimensional Fenjiu LIBS spectral datasets were collected for the 23 samples. To ensure the credibility of the experimental results, 70% of the Fenjiu spectral data (9660) were randomly selected as the training set, 15% as the validation set (2070), and 15% as the test set (2070). Each spectral dataset is independent of each other and consists of different kinds of Fenjiu spectral signals randomly.
Assuming a particle swarm consisting of N particles in a D-dimensional target search space, where each particle is a D-dimensional vector, the spatial location of the ith particle can be expressed as eqn (1).
Xi = (xi1, xi2, …, xiD), i = 1, 2, 3, …, N | (1) |
The spatial position of the particle is a solution to the objective optimization problem, which can be calculated by substituting it into the fitness function, and the merit of the particle is measured according to the size of the fitness value. The flight velocity of the ith particle can be expressed as eqn (2).
Vi = (vi1, vi2, …, viD), i = 1, 2, 3, …, N | (2) |
The particle's position and velocity averages are randomly generated within a given range. The update of the position and velocity of the i particle in generation t as it evolves to generation t + 1 can be represented by eqn (3) and (4).
Xij(t + 1) = xij(t) + vij(t + 1) | (3) |
Vij(t + 1) = ωvij(t) + C1r1(pij(t) − xij(t)) + C2r2[gj(t) − xij(t)] | (4) |
The general randomly generated initialized population makes the algorithm convergence assessment limited due to the uneven distribution of individuals, as shown in Fig. 2(a). So this paper chooses to introduce circle turbid mapping to the population for the initialization operation, as shown in Fig. 2(b), to improve the convergence speed of the algorithm while obtaining a more homogeneous and diverse initial population structure. The sample selection can be represented by eqn (5).
(5) |
In view of the ability of the network search model to find the parameter with the highest accuracy within the specified parameter range, an optimized network search method (OptGSCV) is proposed to traverse each possible parameter combination involved with the qualitative analysis of the model's judgement index as the goal and the parameter range and search step size as constraints. By continuously shrinking the size and step size of the grid, the approximate location of the optimal parameter combination is quickly localized and gradually approached to that point. The method improves the efficiency of parameter optimization while avoiding the interference of local optimal solutions and realizes the quadratic parameter optimization of the network. In this paper, OptGSCV is utilized to perform secondary optimization of the hyperparameters of the model, which ensures the matching of the optimal network structure with the optimal parameters, thus further improving the model performance. The specific steps of OptGSCV are as follows:
• Spectral data are input into the module, the ranges of values of the two parameters to be searched, σ and λ, determined using the BP training function trainscg, are set [2−M, 2M], respectively, and the step size N of the initial search framework is set to obtain a coarse search network, where the nodes in the network are all the possible combinations of the parameters that can be obtained within the given range.
• The value of k is set in the k-fold cross-validation and assign all the parameters are assigned to the model, respectively, after the k-fold cross-validation method of the discriminative performance of the model evaluation, to find the parameter combination with the smallest mean squared error (σi, λj), and the k-fold cross-validation model diagram is shown in Fig. 4.
• The network between four nodes (σi+N, λj+N), (σi+N, λj−N), (σi−N, λj−N) and (σi−N, λj+N) around the parameter combination (σi, λj) is selected as a new search range and the search step size N/2 is set to build a fine-grained network structure, which is again cross-validated to find a new parameter combination (σm, λn) with the smallest mean square error.
• If the analytical metrics obtained from the parameter combination (σm, λn) in the k-fold cross validation have met the experimental requirements, then (σm, λn) is stored into the PSO-GABP model and the analytical metrics are output in the form of a table. If it does not meet the requirements, then return to the third step and the parameters are optimized again until the accuracy requirements are met.
We use the confusion matrix to calculate the accuracy, precision, recall, and F1 value and then use these four metrics to evaluate the effectiveness of the classifier's performance. Accuracy is defined as the ratio of the number of correctly classified samples to the total number of samples, as shown in eqn (6), where TP is the correct prediction of the positive class, TN is the correct prediction of the negative class, FP is the false prediction of the positive class, and FN is the false prediction of the negative class.
(6) |
Precision is defined as the ratio of the number of samples correctly predicted to be in the positive category to the number of samples predicted to be in the positive category and is usually used to indicate the degree of prediction of the correct sample results. Its formula is shown in eqn (7).
(7) |
Recall is the ratio of the number of correct samples retrieved to the number of all correct samples in the database and is used to measure the ability of the classifier to recognize correct samples. Its formula is shown in eqn (8).
(8) |
The F1 value is the reconciled average of precision and recall and is commonly used to evaluate the performance of machine learning models on unbalanced datasets. Its formula is shown in eqn (9).
(9) |
According to Fig. 5, small amounts of metallic trace elements such as Mn, Zn, Cu, Fe, Ca and Mg exist in Baijiu, and although their contents are very small, their absence determines the quality and taste of Baijiu. These trace elements are involved in the aging process of the Baijiu in the vat, which promotes the maturation and aging of the Baijiu. Mn, Cu, Fe and other elements are involved in the redox reaction of the Baijiu, which promotes the evolution of the organic matter in the Baijiu, and makes each Baijiu have unique sensory characteristics. Not only this, the presence of trace elements increases the chemical complexity of Baijiu, making it richer and more diverse. Their interaction with organic components produces a variety of chemical reactions that form the unique aroma and flavor of the Baijiu.41 Therefore, the plasma spectrometry technique is used to determine the content of trace elements in the Baijiu body, so as to identify the different series, degrees and production dates of Fenjiu, and its analytical accuracy and precision are reliable.
Since the experiments were carried out in bare air, there are some high contents of C, O, N, and H in the full spectral information of Baijiu. Because the electronic excitation energy level of C is high, the plasma spectral lines of C are usually in the shorter wavelength range. The plasma spectral lines of H are distributed in a longer wavelength range because the ground state electron energy level of H is lower. In addition to this, N and O are more abundant in the visible wavelength band of 700–900 nm. The existence of these elements will not only affect the other spectral data with noise, but also lead to an increase in the subsequent calculations wasting a lot of time. So it is very necessary to perform feature extraction on the full-spectrum signal on the basis of keeping the main features. Considering the small differences in elemental composition and high similarity of spectral information between different Fenjiu from the same manufacturer, the principal component analysis (PCA) method was chosen to be used for feature extraction of the full spectrum information.
In order to avoid obvious order-of-magnitude differences in the input variables, these signals are min–max normalized, and the calculation formula is shown in eqn (10), where max{xj} is the maximum value of the sample data and min{xj} is the minimum value of the sample data. Min–max standardization is a linear transformation of the original data so that the resultant values are mapped between [0, 1], eliminating the effects of the variables and the range of variation, while preserving the relationship between the data to the greatest extent possible.
(10) |
By constructing the cumulative contribution curves of different principal components, the number of principal components that can be used for subsequent analysis is selected. In this paper, we adopt the method based on the change amplitude of the cumulative contribution curve in selecting the number of principal components, i.e., we observe the change trend of the cumulative contribution curve, look for an “inflection point” that represents the significant change of the growth rate of the cumulative contribution rate with an increase in the number of principal components, and select the number of principal components at this inflection point as the final number of principal components. The number of principal components at this inflection point is chosen as the final number of principal components. Fig. 6 shows the cumulative contribution curves of the first 10 principal components after PCA treatment, and the contribution rates of the first 6 PCs are 46.5%, 29.89%, 10.04%, 2.17%, 0.75%, and 0.66%, respectively, and the cumulative contribution rate reaches 90.01%, which indicates that the first PCs carry most of the information of the original variables, so the 6 PCs are selected as the representative of the entire sample variable representing the characteristics of the whole sample for subsequent analysis.
First, the PSO algorithm is used to simulate the foraging behavior of bird flocks and search for optimal solutions in the solution space in a disordered-to-ordered manner. Through the strategy of adaptive inertia weights, PSO dynamically adjusts the search direction and speed of each particle during the search process to find the optimal number of nodes of the hidden layer in the global solution space β. The randomness and regularity of the initial search population are enhanced by the circle turbid mapping to ensure the diversity of exploration.
Second, after the PSO has determined the more desirable network structure parameters, the network is further internally optimized using a GA. The GA screens the individuals (i.e., network configurations) in the initial population based on their performance (fitness) by mimicking the natural selection mechanism and generates a new generation of network configurations by crossover and mutation operations. This process is optimized for the weight matrix and bias terms of the BP neural network, aiming to improve the training effectiveness and generalization ability of the network.
Finally, the grid search algorithm with an optimization step is introduced to carry out the secondary optimization search for the two hyperparameters that affect the model evaluation performance, and the overall analysis performance of the model is evaluated by k-fold cross-validation with the model accuracy, precision, recall, and F1 value as reference indices, so as to achieve the optimal performance of the overall model under the optimization of a single parameter, and the model framework of OptGSCV-PSO-GA-BP is shown in Fig. 7.
PSO is mainly used to optimize the external structural parameter of the neural network, i.e., the number of nodes in the hidden layer (β). We optimize the hidden layer parameter β as a free parameter to find the optimal network structure. This parameter does not limit the number of hidden layers or the number of nodes, but indicates the number of nodes in the whole hidden layer. PSO finds the optimal number of nodes in the hidden layer by simulating the foraging behavior of a flock of birds to optimize the performance of the whole network architecture. The GA is used for the optimization of weights and thresholds inside the neural network, which is achieved by evaluating, selecting, crossover, and mutating the weight matrix and bias terms of the network in order to obtain the optimal network architecture. On the premise of obtaining the optimal network architecture, OptGSCV is used to conduct secondary optimization search again for the two hyperparameters learning rate (σ) and regularization coefficient (λ), which have a greater impact on the performance of the model, to ensure that the optimal network architecture matches with the optimal parameters and to further improve the performance of the model. Fig. 7 presents the mismatch of network architectures with different colors, and the network model with different colors from the beginning is eventually optimized to a model with consistent colors to illustrate the optimal parameter selection and the construction of the optimal network architecture.
In order to classify the 23 samples with 13800 × 6 different Fenjiu spectral data with high accuracy, the sample data were first Z-score normalized to reduce the potential adverse effects of anomalous samples on the model performance. In the process of optimizing the network structure, the PSO algorithm was used to dynamically adjust the number of nodes in the hidden layer. The initial number of particles is set to 30, the maximum number of iterations is 10, and 3 × 3 initialized parameters to be optimized are generated according to the boundary position, including the number of nodes in the implied layer β and the two hyper-parameters σ and λ of the quantized connected gradient BP training function trainscg. The number of nodes in the input layer 6 depends on the dimensions of the input data. For the three parameters to be optimized, the search ranges are set to be 1 < β < 20, 10−8 < σ < 10−2, and 10−7 < λ < 10−2, and the BP neural network is externally optimized using iterative adaptation as a metric.
(11) |
The BP neural network under specific conditions is constructed based on the optimal number of hidden layer nodes found by external optimization, followed by determining the optimal structure and hyperparameter settings of the BP by the GA. The number of populations is set to 50, the maximum number of iterations is 80, the variation rate is 0.1, the crossover rate is 0.2, the optimal weights corresponding to the network architecture are outputted to be 400, and the number of optimal thresholds is 29. Then, traincg is used as the optimizer of weights and biases to make fine adjustments, and the maximum number of training times of the network is set to be 200, and the learning rate is set to be 0.2, with the target error being 1 × 10−8, and the network is quickly and efficiently adjusted by using the conjugate gradient method for fast and effective weight adjustment of the network. traincg meticulously adjusts the weights and biases to minimize the error function based on the network structure and hyperparameters determined by the GA. Efficient network optimization is achieved through global search by the GA and local fine tuning by traincg to output the correctness of the training set and the trained neural network (net) at this point.
The optimal fitness 99.957 exists when β = 17 and σ = 0.0071, and λ = 0.0031 is finally selected through the stochastic search of the network and assigned to the BP neural network, at which time the network error can reach the required value of 1 × 10−8. The parameter optimization process of the model is carried out through several iterations, each of which evaluates the performance of the current parameter combination. With the PSO and GA algorithms, we were able to search efficiently over a wide parameter space, while OptGSCV further fine-tuned the parameters based on these optimizations. Eventually, the parameter combination with the best performance on the validation set is selected as the final parameters of the model. The optimal network architecture is output to the PSO, and random slicing is chosen to slice the dataset to prevent the large deviation between the model performance and the real metrics caused by the fixed slicing dataset. On the premise of finding the optimal network architecture with the model optimal fitness as the criterion, OptGSCV is used to conduct the secondary optimization search for the two hyperparameters σ and λ, which have a large influence. Setting the search range σ (10−5, 100) and λ (10−5, 100) the search step is set to 1 × 10−5. 105 × 105 hyperparameter combinations are searched and the search results are shown in Fig. 8.
After traversing 1 × 1010 different spatial search points, the optimal hyper-parameter combinations σ = 0.0076 and λ = 0.0083 corresponding to the optimal network architecture were found, and the qualitative analysis accuracy of the model at this time was 99.984%. In order to conduct a comprehensive assessment of the model to reduce the risk of model overfitting, the 10-fold cross-validation method was used to assess the generalization of the OptGSCV-PSO-GA-BP model in the qualitative analysis of Fenjiu's multidimensional data, and the model's qualitative analysis accuracy was used as the evaluation index. As shown in Fig. 9, the highest analysis accuracy of the model is 99.998% and the lowest is 99.945%, which further validates the generalization performance of the quadratic optimization model for high-precision qualitative analysis of Baijiu.
The qualitative analysis performance of the optimal model is described by the confusion matrix as shown in Fig. 10 and 11. From the confusion matrix, it can be seen that the qualitative classification accuracy is 99.987% in the training set and 99.983% in the test set. The vast majority of the predicted and true labels of Fenjiu are the same, both in the test set and in the prediction set. There are 14 Fenjiu samples with prediction errors in the training set and 4 Fenjiu samples with prediction errors in the test set. In order to further illustrate the reasons for the prediction errors of a small number of Fenjiu samples, the accuracy, precision, recall, and F1 values corresponding to the 23 different Fenjiu samples in the training and test sets were respectively calculated, as shown in Table 2.
Sample | Data set | GSCV-PSO-GA-BP sample performance evaluation index (%) | |||
---|---|---|---|---|---|
Accuracy | Precision | Recall | F1-Score | ||
Sample 1 | Training set | 99.979 | 99.518 | 100 | 99.758 |
Test set | 99.903 | 98.824 | 98.824 | 98.824 | |
Sample 2 | Training set | 99.990 | 100 | 99.762 | 99.881 |
Test set | 99.903 | 98.876 | 98.876 | 98.876 | |
Sample 3 | Training set | 100 | 100 | 100 | 100 |
Test set | 100 | 100 | 100 | 100 | |
Sample 4 | Training set | 100 | 100 | 100 | 100 |
Test set | 100 | 100 | 100 | 100 | |
Sample 5 | Training set | 99.979 | 99.515 | 100 | 99.757 |
Test set | 100 | 100 | 100 | 100 | |
Sample 6 | Training set | 99.990 | 100 | 99.758 | 99.879 |
Test set | 100 | 100 | 100 | 100 | |
Sample 7 | Training set | 99.990 | 100 | 99.764 | 99.882 |
Test set | 100 | 100 | 100 | 100 | |
Sample 8 | Training set | 99.990 | 100 | 99.761 | 99.881 |
Test set | 100 | 100 | 100 | 100 | |
Sample 9 | Training set | 99.990 | 99.761 | 100 | 99.880 |
Test set | 100 | 100 | 100 | 100 | |
Sample 10 | Training set | 99.979 | 100 | 99.531 | 99.765 |
Test set | 100 | 100 | 100 | 100 | |
Sample 11 | Training set | 99.979 | 99.765 | 99.765 | 99.765 |
Test set | 99.952 | 98.837 | 100 | 99.415 | |
Sample 12 | Training set | 99.990 | 99.768 | 100 | 99.884 |
Test set | 99.952 | 100 | 98.913 | 99.454 | |
Sample 13 | Training set | 99.990 | 100 | 99.770 | 99.885 |
Test set | 100 | 100 | 100 | 100 | |
Sample 14 | Training set | 100 | 100 | 100 | 100 |
Test set | 100 | 100 | 100 | 100 | |
Sample 15 | Training set | 99.990 | 99.762 | 700 | 99.881 |
Test set | 100 | 100 | 100 | 100 | |
Sample 16 | Training set | 99.979 | 99.518 | 100 | 99.758 |
Test set | 99.952 | 98.947 | 100 | 99.471 | |
Sample 17 | Training set | 99.990 | 100 | 99.757 | 99.878 |
Test set | 100 | 100 | 100 | 100 | |
Sample 18 | Training set | 100 | 100 | 100 | 100 |
Test set | 100 | 100 | 100 | 100 | |
Sample 19 | Training set | 99.979 | 99.524 | 100 | 99.761 |
Test set | 99.952 | 100 | 99.010 | 99.502 | |
Sample 20 | Training set | 99.979 | 100 | 99.523 | 99.761 |
Test set | 100 | 100 | 100 | 100 | |
Sample 21 | Training set | 99.969 | 99.525 | 99.762 | 99.643 |
Test set | 100 | 100 | 100 | 100 | |
Sample 22 | Training set | 99.979 | 100 | 99.525 | 99.762 |
Test set | 100 | 100 | 100 | 100 | |
Sample 23 | Training set | 100 | 100 | 100 | 100 |
Test set | 100 | 100 | 100 | 100 |
The above results show that there is a certain connection between the 23 samples and the trace elements that determine the taste of Fenjiu of the same series, the brand and degree are the same, and the differences are small, which makes it difficult to guarantee the accuracy of the traditional Baijiu qualitative analysis technique. The combination of AS-LIBS and the OptGSCV quadratic optimization network with its high analytical performance provides a new data processing solution for multidimensional Fenjiu detection and provides a new data processing scheme. To further validate the analytical performance of the model, it was compared with some commonly used qualitative analysis models (KNN and SVM) and unoptimized BP models (BP, GA-BP, and PSO-GA-BP). Based on the team's previous research experience, k was set to be 8 in the KNN model, and a linear kernel function was selected for the SVM, with the parameters C = 0.5 and g = 0.005. The parameter settings for the BP and GA-BP models were the same as those in GSCV-GA-BP. The data after PCA dimensionality reduction was selected to be trained sequentially, and the results are shown in Table 3.
Data set | Performance evaluation index | Performance evaluation of different qualitative analysis models (%) | |||||
---|---|---|---|---|---|---|---|
KNN | SVM | BP | GA-BP | PSO-GA-BP | GSCV-PSO-GA-BP | ||
Training set | Accuracy | 80.238 | 78.665 | 84.523 | 90.507 | 98.809 | 99.987 |
Precision | 68.625 | 78.665 | 79.902 | 90.509 | 98.810 | 99.855 | |
Recall | 73.958 | 78.663 | 85.000 | 90.508 | 98.808 | 99.855 | |
F1-Score | 70.307 | 78.665 | 80.740 | 90.509 | 98.809 | 99.855 | |
Test set | Accuracy | 78.621 | 78.551 | 83.617 | 90.483 | 98.762 | 99.983 |
Precision | 65.214 | 78.049 | 80.549 | 89.863 | 98.547 | 99.807 | |
Recall | 71.567 | 78.050 | 85.043 | 89.865 | 98.546 | 99.808 | |
F1-Score | 68.328 | 78.049 | 81.064 | 89.864 | 98.547 | 99.807 |
The results show that the PSO-GA-BP network optimized based on the OptGSCV quadratic optimization network exhibits strong qualitative analysis capability in the multi-dimensional Fenjiu detection problem. Compared with the other five algorithms, its analytical performance is more significantly improved both on the training set and the test set, with the qualitative accuracy as high as 99.987% on the training set and 99.983% on the test set. The proposed model simultaneously realizes the identification of Fenjiu's different series, different degrees and different production dates in multiple dimensions at one time, which provides a new method for the high-precision and fast classification of multi-dimensional Baijiu.
• Through the research of laser-induced plasma spectroscopy intensity and particle number coupling models, an LIBS online testing system is constructed to solve the problems of low precision of traditional Baijiu identification technology, complicated pre-processing, long analyzing time, and subjective consciousness of the sommelier. An elemental component detection method for Fenjiu was proposed using new generation LIBS technology to realize high-precision detection and analysis of Baijiu.
• By studying the volatility of ethanol, the main component of Fenjiu, and the plasma spectral enhancement scheme of high-frequency ultrasonic atomization, we constructed a plasma spectral signal detection system for Baijiu and put forward the AS-LIBS high-frequency atomization scheme, which realizes a new process of plasma signal detection for Baijiu.
• Through the study of intelligent qualitative classification algorithm implicit layer nodes, weights, and hyperparameter iterative model, the PSO intelligent optimization method is adopted to make the best collocation between the linkage parameters under the premise that each parameter of the qualitative categorization model reaches the optimal value, to solve the technical problem of mismatch between the selection of the optimal parameters and the construction of the optimal network and to realize the fusion of the intelligent qualitative classification algorithm and the new network of the traditional classification algorithm.
• Through the research of optimal intelligent qualitative analysis model parameter adaptation and the optimal parameter extraction method of OptGSCV, we constructed a multi-dimensional plasma spectral information database of Fenjiu, built a multi-dimensional detection and analysis system of Fenjiu based on the quadratic optimization network, solved the bottlenecks of the Baijiu testing industry, such as low precision, high difficulty, time-consuming, etc., and realized the new platform of on-line qualitative analysis of Baijiu in situ with high precision and high efficiency.
AS-LIBS combined with the OptGSCV-PSO-GABP Baijiu testing and analyzing system provides a brand new qualitative classification and authenticity identification method for the Baijiu production and identification industry and plays an important role in the fields of food safety, anti-counterfeiting detection, analysis and testing. In particular, it is of great significance in combating counterfeit and shoddy Baijiu and protecting the legitimate interest of consumers and enterprises and lays the foundation for realizing multi-dimensional high-precision component detection by major famous Baijiu enterprises and classification of counterfeit and shoddy Baijiu in the market.
This journal is © The Royal Society of Chemistry 2024 |