Chenwei
Zhu
*a,
Qizhong
Pan
a,
Zhanjian
Lin
b and
Xiangyou
Li
b
aSchool of Electrical and Electronic Engineering, Wuhan Polytechnic University, Wuhan, Hubei 430048, P. R. China. E-mail: chenweiz@whpu.edu.cn
bWuhan National Laboratory for Optoelectronics (WNLO), Huazhong University of Science and Technology, Wuhan, Hubei 430074, P. R. China
First published on 14th October 2025
Herbals play a crucial role in maintaining human health owing to their therapeutic properties. However, herbals with excessive copper (Cu), manganese (Mn), and lead (Pb) are common due to pollution. Conventional detection methods for toxic elements are time-consuming and prone to contamination. Laser-induced breakdown spectroscopy (LIBS), a technology based on plasma analysis, offers rapid detection. Combining LIBS with chemometrics has become a popular approach for herbal detection. However, identifying toxic elements remains challenging due to difficulties in extracting relevant spectral variables. This study proposed using normalized mutual information (NMI) for variable extraction, while the student psychology-based optimization (SPBO)-kernel extreme learning machine (KELM) was used to classify herbals based on Cu, Mn, and Pb contents. The results showed that the number of extracted variables was only 0.018%, 0.073%, and 0.66% of total variables. Compared with principal component analysis-KELM, NMI-KELM improved average accuracy and F1 by 5.00% and 2.87%. With SPBO optimization, NMI-KELM's average accuracy and F1 increased to 94.00% and 93.14%. This study provided a foundation for the rapid and accurate classification of herbals based on toxic element content.
Current detection methods mainly rely on inductively coupled plasma-optical emission spectroscopy (ICP-OES)7,8 or inductively coupled plasma-mass spectrometry (ICP-MS).9,10 However, these techniques require time-consuming pretreatment and toxic reagents, making them unsuitable for rapid analysis. Therefore, the rapid detection of toxic elements in herbals is of significant importance.
Laser-induced breakdown spectroscopy (LIBS) enables rapid detection with minimal sample preparation.11 It works by focusing a laser beam on the sample surface to generate characteristic spectra for analysis.12,13 Combining LIBS with chemometrics has become a popular approach for detecting herbals. However, due to the difficulty in extracting spectral variables related to toxic elements, most research has focused on species classification rather than toxic element detection.14,15 Principal component analysis (PCA) is commonly used for spectral variable extraction,14 but it often ignores label correlations, leading to reduced accuracy. Therefore, developing an effective method for toxic element detection from herbals is crucial.
Normalized mutual information (NMI), based on entropy theory, better captures the correlation between spectral variables and labels than PCA,16 especially when spectral variables of toxic elements contain limited information. Additionally, kernel extreme learning machine (KELM), a method based on feedforward neural networks,17 demonstrates higher accuracy and greater stability than support vector machines (SVMs) for classification. Therefore, this study used NMI for spectral variable extraction and KELM for herbal classification based on toxic element content.
In this work, LIBS was combined with chemometrics to classify herbals based on Cu, Mn, and Pb concentrations. NMI was used to extract optimal spectral variables. KELM was applied for classification based on LIBS spectra, relying on Cu, Mn, and Pb contents. The performance of NMI-KELM was compared with that of PCA-KELM to demonstrate the advantages of NMI. Furthermore, student psychology-based optimization (SPBO) was utilized to optimize the parameters of NMI-KELM, resulting in the NMI-SPBO-KELM model. To validate the classification performance of NMI-SPBO-KELM, it was compared with that of NMI-KELM and NMI-SPBO-SVM.
All experiments were conducted under atmospheric conditions using optimized parameters: 63 mJ laser energy and 1.5 μs acquisition delay. Each spectrum was accumulated with 400 shots to reduce the deviation of intensity. A total of 30 herbal samples were obtained, and 20 spectra were collected from each sample, totaling 600 spectra.
To eliminate the influence of other elements, different standard solutions were added to powder to prepare herbals with varying toxic element contents. The standard gradient solutions were prepared using the gravimetric method. The key steps of sample preparation followed previous work,18 as shown in Fig. 2: first, 0.5 g of powder was weighed and mixed with 0–0.75 mL of standard solutions at varying concentrations; next, the mixture was dried to a weight of 0.5 g; finally, the dried mixture was tiled and glued on the glass slide. This sample preparation method aimed to closely simulate real herbal samples, thereby reducing potential matrix differences caused by artificial substitutes like pure starch.
Additionally, the classification model evaluation assumed that samples with high Cu, Mn, or Pb concentrations were potentially toxic, based on regulatory norms. These samples were classified as Class 2. Samples with allowed concentrations of Cu, Mn, and Pb were classified as Class 1. The allowed contents are 20 mg kg−1 for Cu and 5 mg kg−1 for Pb, as defined in WM/T 2-2004. The limit for Mn is 256 mg kg−1, calculated from the 2023 Chinese DRIs Tables.
In this study, NMI was used to reflect the correlation between spectral variables and the classification label, achieving effective spectral variable selection. Moreover, the NMI value was calculated by normalizing the MI value using the total information from both spectral variables and classification labels. This allows NMI to accurately reflect the relationship between input variables and output labels, even when the correlation is weak or contains limited information.19,20
![]() | (3-1) |
| K(x, xN) = exp(−g‖x − xN‖2) | (3-2) |
Among them, K(x, xN) is the kernel function, C is the regularization coefficient, ΩELM is the kernel matrix, T is the target matrix, and g is the kernel parameter.
To classify herbals relying on toxic elements, this study proposed an extension of SPBO-KELM that incorporated NMI for input variable selection, named NMI-SPBO-KELM. The model construction process based on NMI-SPBO-KELM is illustrated in Fig. 3, which included the following steps:
Step 1: after inputting the original spectral data, five-fold cross-validation was conducted. The original spectra of herbals with toxic concentrations within the allowed limits were labeled as Class 1, while those exceeding the limits were labeled as Class 2. The spectral data were then divided into a training set and a test set.
Step 2: preprocessing of the spectral dataset. First, the input spectral variables were extracted using either NMI or PCA. When extracting the variables using PCA, PCs with a cumulative variance contribution rate exceeding 99% were selected as the inputs. Moreover, because the datasets were different in each fold, the number of PCs remained different across folds. For Cu, the number of PCs was either 22 or 11; for Mn, it was 10, 11, or 14; and for Pb, it ranged from 11 to 13.
Secondly, the mapminmax function was applied to standardize both the training and test sets. Finally, the synthetic minority oversampling technique (SMOTE) was used to synthesize minority class spectra in the training set, addressing class imbalance.
Step 3: using the training set and the SPBO algorithm, the KELM classification model with optimal parameters (C and g) was established. The training set error served as the fitness function. SPBO was set with a maximum of 15 iterations and a population size of 10. Parameters C and g were optimized within ranges of 1–10 and 0.1–1, respectively. Optimization ended when the minimum fitness value was reached or the iteration limit was met.
Step 4: based on the SPBO-KELM model obtained in Step 3, the herbals in the test set were classified. Predictions were considered correct when predicted labels matched actual labels.
Additionally, to evaluate NMI-SPBO-KELM comprehensively, all preprocessing steps were kept consistent, and results were compared with those of NMI-SPBO-SVM.
| Actual | Prediction | Negative (Class 1) | Positive (Class 2) |
|---|---|---|---|
| Negative (Class 1) | True negative (TN) | False positive (FP) | |
| Positive (Class 2) | False negative (FN) | True positive (TP) | |
Based on the confusion matrix, the accuracy (Acc), recall (Rec), precision (Pre) and F1 are expressed as follows:
![]() | (3-3) |
![]() | (3-4) |
![]() | (3-5) |
![]() | (3-6) |
Among them, the Acc represents the ratio of correctly classified samples relative to the total. The F1 score is a comprehensive measure that combines Rec and Pre. Therefore, Acc and F1 were selected as the evaluation indicators in this study, with higher values reflecting better classification performance.
| Samples | No. | Cu (mg kg−1) | Mn (mg kg−1) | Pb (mg kg−1) | Class | ||
|---|---|---|---|---|---|---|---|
| Cu | Mn | Pb | |||||
| a Raw herbal samples. | |||||||
| Licorice | 1#a | 10.05 | 34.60 | 3.11 | 1 | 1 | 1 |
| 2# | 43.38 | 201.27 | 169.78 | 2 | 1 | 2 | |
| 3# | 26.72 | 117.94 | 86.45 | 2 | 1 | 2 | |
| 4# | 60.05 | 284.61 | 253.12 | 2 | 2 | 2 | |
| 5# | 18.05 | 354.60 | 43.11 | 1 | 2 | 2 | |
| 6# | 14.05 | 194.60 | 23.11 | 1 | 1 | 2 | |
| 7# | 22.05 | 514.60 | 63.11 | 2 | 2 | 2 | |
| 8# | 668.67 | 59.74 | 7.61 | 2 | 1 | 2 | |
| 9# | 339.36 | 47.17 | 5.36 | 2 | 1 | 2 | |
| 10# | 997.98 | 72.31 | 9.86 | 2 | 1 | 2 | |
| Astragal-us | 11#a | 6.90 | 30.05 | 1.26 | 1 | 1 | 1 |
| 12# | 40.23 | 196.72 | 167.93 | 2 | 1 | 2 | |
| 13# | 23.57 | 113.39 | 84.60 | 2 | 1 | 2 | |
| 14# | 56.90 | 280.06 | 251.27 | 2 | 2 | 2 | |
| 15# | 14.90 | 350.05 | 41.26 | 1 | 2 | 2 | |
| 16# | 10.90 | 190.05 | 21.26 | 1 | 1 | 2 | |
| 17# | 18.90 | 510.05 | 61.26 | 1 | 2 | 2 | |
| 18# | 665.52 | 55.19 | 5.76 | 2 | 1 | 2 | |
| 19# | 336.21 | 42.62 | 3.51 | 2 | 1 | 1 | |
| 20# | 994.83 | 67.76 | 8.01 | 2 | 1 | 2 | |
Mixture of licorice and astragal-us (1 : 1) |
21#a | 8.48 | 32.33 | 2.19 | 1 | 1 | 1 |
| 22# | 41.81 | 199.00 | 168.86 | 2 | 1 | 2 | |
| 23# | 25.15 | 115.67 | 85.53 | 2 | 1 | 2 | |
| 24# | 58.48 | 282.34 | 252.20 | 2 | 2 | 2 | |
| 25# | 16.48 | 352.33 | 42.19 | 1 | 2 | 2 | |
| 26# | 12.48 | 192.33 | 22.19 | 1 | 1 | 2 | |
| 27# | 20.48 | 512.33 | 62.19 | 2 | 2 | 2 | |
| 28# | 667.10 | 57.47 | 6.69 | 2 | 1 | 2 | |
| 29# | 337.79 | 44.90 | 4.44 | 2 | 1 | 1 | |
| 30# | 996.41 | 70.04 | 8.94 | 2 | 1 | 2 | |
The calculated values in Table 2 cannot replace the measurement results. They can only be used to define the classification labels: the class assignments for samples exceeding (Class 2) or below (Class 1) the regulatory norms. For example, Sample 5 was classified as Class 1 for Cu, but Class 2 for Mn and Pb. According to Table 1, a total of 30 herbal samples were collected, each with 20 spectra, resulting in 600 spectra overall. Among them, 20 samples had excessive Cu, 9 samples had excessive Mn, and 25 samples had excessive Pb. Therefore, the number of spectra with excessive Cu was 400, Mn was 180, and Pb was 500.
Because the number of excessive samples varied across elements, the number of synthetic spectra generated via SMOTE was also different. With five-fold cross-validation in this study, each training set included 480 spectra: 320 for Cu, 140 for Mn, and 400 for Pb in Class 2; 160 for Cu, 340 for Mn, and 80 for Pb in Class 1. Thus, SMOTE generated 160 and 320 synthetic spectra of Class 1 for Cu and Pb, while it generated 140 synthetic spectra of Class 2 for Mn in the training set.
The spectra of raw licorice, astragalus, and their mixture are shown in Fig. 4. As seen in Fig. 4(a), within the wavelength range of 185–615 nm, most spectral peaks were located in the regions of 249–320 nm, 370–460 nm, and 550–615 nm. These peaks indicated that the main elements in herbals are C and N, with smaller amounts of K, Ca, and Mg, and trace elements such as Cu, Mn, and Pb. Although the toxic metal content in astragalus was lower than that in licorice, its spectral intensity was higher, as shown in Fig. 4(b). This may be due to the matrix effect in herbals,18 which can lead to a non-linear relationship between spectral intensity and element content, making classification based on toxic metal content more challenging.
![]() | ||
| Fig. 4 LIBS spectra of raw licorice, astragalus, and a licorice and astragalus mixture: (a) full spectral lines and (b) partial spectral lines. | ||
093 wavelengths, which corresponded to 33
093 variables for analysis. However, many of these variables were redundant and could lead to overfitting. Therefore, the NMI was used to extract relevant spectral variables. In this process, the regularization coefficient “C” and the kernel parameter “g” in KELM were set to default values of 1 and 0.5.
After spectral variable extraction, the classification accuracy of KELM based on different toxic metals is shown in Fig. 5. The horizontal axis represented the number of spectral variables, and the vertical axis represented the accuracy of the verification set. When classifying herbals with excessive Cu, Mn and Pb, the accuracy first increased and then decreased with the number of variables. This trend may be related to the sequence of variable importance determined by the NMI value. In the NMI, effective variables were selected by analyzing the correlation between spectral variables and class labels. As the sequence number of variable importance increased, the NMI value decreased, indicating that the above correlation weakened. Too few variables led to insufficient information for accurate classification, while too many introduced irrelevant variables that degraded model performance.19
As shown in Fig. 5, the highest accuracy for Cu was achieved with 6 variables, reaching 82.00%. For Mn, the maximum accuracy of 84.17% was obtained using 24 variables. In the case of Pb, the highest accuracy of 97.00% was achieved with 220 variables. These results demonstrated that the NMI effectively extracted relevant input variables, and the optimal numbers accounted for only 0.018%, 0.073%, and 0.66% of the total variable count (33
093), respectively.
Furthermore, to investigate the impact of variable selection methods, NMI and PCA were compared. The accuracy and F1 of the training and prediction sets are presented in Table 3. The slightly better performance of PCA-KELM in detecting Mn can be attributed to the variance-based separability of Mn spectra, which was effectively captured by PCA. In contrast, NMI relied on mutual information and may be less sensitive to high-variance variables with weak label correlation. This limitation did not undermine the advantages of NMI. For the prediction set, the average accuracy and F1 of NMI-KELM were 92.28% and 91.47%, which exceeded those of PCA-KELM (87.28% and 88.60%). In PCA, principal components were constructed based on the variance values of spectral data, ranked from largest to smallest. However, these selected components might overlook the correlation between input variables and output labels, potentially compromising prediction accuracy.23 In contrast, NMI effectively addressed this issue, thereby enhancing analytical performance.
| Model | Toxic elements | ||||
|---|---|---|---|---|---|
| Cu | Mn | Pb | |||
| a When extracting the variables using PCA, the principal components with a cumulative variance contribution rate exceeding 99% were selected as the input variables. | |||||
| Training set (%) | Acc | PCA-KELMa | 95.94 | 97.71 | 99.05 |
| Acc | NMI-KELM | 93.19 | 93.42 | 100.00 | |
| F 1 | PCA-KELMa | 95.79 | 97.54 | 99.05 | |
| F 1 | NMI-KELM | 92.69 | 92.85 | 100.00 | |
| Test set (%) | Acc | PCA-KELMa | 79.50 | 91.33 | 91.00 |
| Acc | NMI-KELM | 89.83 | 90.67 | 96.33 | |
| F 1 | PCA-KELMa | 84.83 | 86.32 | 94.64 | |
| F 1 | NMI-KELM | 91.79 | 84.87 | 97.76 | |
| Model | Toxic elements | ||||
|---|---|---|---|---|---|
| Cu | Mn | Pb | |||
| Training set (%) | Acc | NMI-KELM | 93.19 | 93.42 | 100.00 |
| Acc | NMI-SPBO-KELM | 95.31 | 100.00 | 100.00 | |
| F 1 | NMI-KELM | 92.69 | 92.85 | 100.00 | |
| F 1 | NMI-SPBO-KELM | 95.10 | 100.00 | 100.00 | |
| Test set (%) | Acc | NMI-KELM | 89.83 | 90.67 | 96.33 |
| Acc | NMI-SPBO-KELM | 92.33 | 92.50 | 97.17 | |
| F 1 | NMI-KELM | 91.79 | 84.87 | 97.76 | |
| F 1 | NMI-SPBO-KELM | 93.95 | 87.18 | 98.28 | |
The results showed that SPBO effectively optimized the C and g parameters in KELM, improving classification accuracy. According to the formula (3-1) and (3-2), while the regularization parameter “C” affected classification performance, the kernel parameter “g” also significantly influenced accuracy. Although introducing the kernel function enhanced KELM stability, improper parameter selection can reduce model performance. SPBO iteratively updated particle positions to find the global optimum,17,21,24 thereby determining the optimal C and g values to improve the accuracy of KELM.
Moreover, the results indicated that the F1 of Class 1 exceeded 89%, while the average F1 for Class 1 (92.09%) was slightly lower than that of Class 2 (93.14%). This was likely due to the underrepresentation of Class 1, which was mainly a minority class. During training, the classifier favored majority-class features, potentially harming minority-class performance. But the average F1 (>90%) still confirmed the method's reliability in classifying non-excessive herbals, even at low concentrations.
To comprehensively evaluate the performance of NMI-SPBO-KELM, it was compared with that of NMI-SPBO-SVM. The accuracy and F1 of both models are shown in Fig. 6(a) and (b). For Cu, the accuracy and F1 of NMI-SPBO-SVM were 0.50% higher than those of NMI-SPBO-KELM. For Mn and Pb, the average accuracy and F1 of NMI-SPBO-KELM were 94.84% and 92.73%, exceeding those of NMI-SPBO-SVM (93.50% and 90.68%). Overall, the average accuracy and F1 of NMI-SPBO-KELM were 0.77% and 1.34% higher than those of NMI-SPBO-SVM. Besides, NMI-SPBO-KELM was a framework for classification, not limited to specific elements. Cu, Mn, and Pb were used as examples to demonstrate its effectiveness. Applying it to other toxic elements such as Cd, Cr, and As would greatly expand its applicability, which is a key focus of future work.
![]() | ||
| Fig. 6 Classification results of NMI-SPBO-KELM and NMI-SPBO-SVM relying on toxic elements: (a) Acc and (b) F1. | ||
The performance improvement may be due to KELM's stronger optimization capability in three-dimensional space compared to the optimal hyperplane of SVM.22,25 Additionally, KELM replaces the hidden layer output matrix with a kernel function, enhancing its generalization ability. Moreover, in herbal classification, the accuracy of NMI-SPBO-KELM was higher than that in other studies, as shown in Table 5.
Since the model classifies herbals based on whether their toxic element levels (Cu, Mn, and Pb) exceed regulatory thresholds, the model outputted discrete class labels (Class 1: within limits; Class 2: exceeding limits). Therefore, it is difficult to directly determine which concentration level classification has failed. Future work will explore quantitative prediction of trace concentrations to improve its applicability in precise monitoring.
| This journal is © The Royal Society of Chemistry 2026 |