Honglin
Jian
a,
Lei
Deng
b,
Jun
Wang
c,
Zikui
Shen
d,
Xilin
Wang
*a and
Zhidong
Jia
a
aEngineering Laboratory of Power Equipment Reliability in Complicated Coastal Environments, Tsinghua University, Shenzhen, Guangdong 518055, PR China. E-mail: jianhl23@mails.tsinghua.edu.cn; wang.xilin@sz.tsinghua.edu.cn
bDepartment of Precision Instrument, Centre for Brain Inspired Computing Research, Tsinghua University, Beijing, 100084, PR China
cState Key Laboratory of Environmental Adaptability for Industrial Products, China National Electric Apparatus Research Institute Co., Ltd, Guangzhou, Guangdong 688128, PR China
dSchool of Electric Power Engineering, South China University of Technology, Guangzhou, Guangdong 510640, PR China
First published on 29th July 2025
Laser-induced breakdown spectroscopy (LIBS) has been widely employed for the detection and analysis of metal materials. However, most current methods that primarily combine dimensionality reduction with machine learning still demonstrate limited discriminative power when distinguishing between metals with similar compositions. To improve the analytical accuracy of LIBS, this study introduces a dynamic vision sensor (DVS) into the LIBS system to capture the optical emissions from plasma and reconstruct plasma images using an event frame method. By fusing spectral data and plasma images, we propose a metal classification model based on a temporal spatial attention fusion network (TSAF Net). TSAF Net employs a combination of 1D-convolutional neural network (1D-CNN) and bidirectional long short-term memory network (BiLSTM) architectures for spectral feature extraction, a 2D-CNN for image feature extraction, and incorporates a multi-head attention mechanism for deep cross-modal feature fusion. A fully connected layer then completes the final metal classification task. To better simulate on-site challenges, the experimental setup introduces disturbances such as laser energy fluctuations. The proposed TSAF Net achieves classification accuracies of 93.24% for carbon steel and 94.57% for copper alloys, along with outstanding macro precision, recall, and F1 scores. Compared with the best-performing conventional methods, TSAF Net increases classification accuracy by 46.21% for carbon steel and 33.86% for copper alloys. Additionally, TSAF Net exhibits high computational efficiency and maintains a compact model size. This study significantly improves the accuracy of LIBS in the identification of metallic materials and provides new insights for the further development and application of LIBS.
Laser-induced breakdown spectroscopy (LIBS)7 employs high-energy laser pulses to ablate the surface of a material, generating a high-temperature plasma. By analysing the characteristic emission spectrum produced as the plasma cools, the compositional information of the material can be determined. LIBS represents a significant application of laser technology,8,9 offering noteworthy advantages such as in situ detection, rapid analytical speed, high sensitivity, the absence of sample preparation requirements, and low detection limits.10 It has been widely applied in areas such as industrial inspection,11,12 food safety,13,14 agricultural monitoring,15,16 space exploration,17,18 and cultural heritage archaeology.19,20 In recent years, an increasing number of studies have employed LIBS for the detection of metal materials, achieving precise classification and identification of different metals. Cui et al. tested steel samples produced by eight different heat treatment processes using LIBS, applied random projection for spectral dimensionality reduction, and employed a convolutional neural network (CNN) to achieve rapid classification of the microstructures of carbon steel samples.21 Han et al. combined principal component analysis (PCA) with the K-nearest neighbour (KNN) to efficiently and rapidly identify high-purity copper and its common copper-based components.22 Dai et al. proposed combining LIBS with a discriminative restricted Boltzmann machine (DRBM) to achieve rapid classification of aluminium alloys using small sample sizes.23 Compared to traditional approaches that perform dimensionality reduction followed by classification, as well as conventional neural networks, the DRBM was able to conduct feature extraction and classification simultaneously, improving classification accuracy by 13.33% over the best comparative method. In addition to combining LIBS with machine learning algorithms for metal identification, a few studies have also integrated data from other sources with LIBS to enhance classification performance. Romero et al. combined LIBS spectra with sample RGB colour images and 3D depth images, utilising a one-dimensional UNet network and the DenseNet121 network for feature extraction from the spectral and image data, respectively.24 They employed late fusion to combine features from multiple sources, which were then processed by a fully connected network to automatically classify scrap aluminium by casting type, deformation type, and commercial grade.
Although LIBS-based metal material detection methods have attracted significant attention due to their rapid analysis speed and operational simplicity, approaches combining spectral dimensionality reduction with traditional machine learning still face certain limitations in practical applications. Complex and variable environmental conditions can lead to fluctuations in laser energy, resulting in instability of the spectral signals and thereby affecting detection accuracy.25 In addition, matrix effects and self-absorption phenomena may interfere with spectral features, further limiting analytical precision. For metal materials with similar compositions, conventional methods often struggle to achieve high-precision identification. Therefore, enhancing the detection capabilities of LIBS under complex scenarios remain important challenges in this field.
Dynamic vision sensor (DVS) is a brain-inspired visual sensor in which each pixel responds independently and asynchronously, detecting only changes in light intensity within the field of view.26,27 It is also commonly known as an event camera. It offers notable advantages, including high-speed processing, microsecond-level temporal resolution, low data volume, and low power consumption.28 At present, DVS has found research applications across various fields, including machinery fault diagnosis, discharge monitoring, biomedicine, and behaviour recognition. For instance, X. Li et al. employed event cameras to record subtle, high-speed motions on mechanical surfaces and, through specialised data processing and deep learning models, achieved efficient identification of equipment health status.29 Q. Yuan et al. utilised event cameras to monitor high-voltage discharge phenomena, reconstructing discharge images from event data and conducting frequency domain analysis, thereby providing a novel technique for discharge monitoring.30 W. He et al. developed a high-throughput, high-efficiency cell sorting system based on event cameras and neuromorphic processing.31 F. Hamann et al. proposed an event camera and Fourier-based method for wildlife behaviour recognition, which enables accurate extraction and classification of different behaviours.32 However, within the field of analytical chemistry, and in particular spectroscopy, there have not yet been any reports concerning the application of DVS, despite its significant potential.
This study innovatively employs DVS to capture optical signals emitted by plasma, reconstructing plasma images through event frame method. In terms of feature extraction, this study designs a 1D-convolutional neural network (1D-CNN) and bidirectional long short-term memory network (BiLSTM) hybrid architecture for extracting spectral features and a 2D-CNN architecture for extracting plasma image features. To achieve effective cross-modal data fusion, a multi-head attention mechanism (MHA) is introduced, and the metal classification task is completed via a fully connected network. We have named this model the temporal–spatial attention fusion network (TSAF Net). To better reflect on-site application scenarios, laser energy fluctuations were introduced into the experiment to simulate complex on-site environments. Compared with traditional methods (such as PCA-SVM and CNN) in the classification of different types of carbon steel and copper alloy, the results indicate that TSAF Net achieves a significant improvement in classification accuracy. This approach not only offers excellent analytical performance, low cost, ease of operation, and rapid processing speed, but also demonstrates considerable potential for on-site application.
The plasma excitation system utilises a Nimma-900 nanosecond Nd: YAG laser (emission wavelength: 1064 nm) from Beijing Beamtech Optronics Co., Ltd, along with the associated optical path. The laser emits high-energy pulses that are first reflected at a 90° angle by a dielectric mirror, redirecting the horizontal laser beam into a vertical orientation. This vertically oriented beam is then focused onto the sample surface using a convex lens with a focal length of 110 mm, resulting in sample ablation and plasma generation. The plasma information acquisition section includes the optical path, a spectrometer, and a DVS. Plasma emission is collected and focused onto a fibre optic probe via a convex lens with a 60 mm focal length. The probe is positioned at a 45° angle relative to the sample surface, ensuring optimal collection of spectral information. The spectrometer utilised is the AVS-RACKMOUNT-USB2 from Avantes, with a wavelength range of 200–640 nm. An integration time of 30 μs was used for acquiring high-intensity spectra. A DG645 digital delay generator from Stanford Research Systems is used to provide a 1 μs acquisition delay. For real-time optical signal acquisition, the DAVIS346 from iniVation (Switzerland) is employed.33 This sensor is aligned parallel to the sample surface and outputs an event stream associated with the plasma optical signals,34 offering a spatial resolution of 346 × 260 and a temporal resolution of 1 μs. Collected spectral and event data are transmitted to a computer for further analysis, with all data processing conducted using Matlab R2024b software. Experimental optimisation established that an optimal single-pulse laser energy is 95 mJ, with a delay time of 1 μs selected to achieve high-intensity spectral signals and an excellent signal-to-noise ratio. Meanwhile, the number of events from the DVS and the area of the plasma region were employed as evaluation criteria. Experimental parameters yielding a larger plasma area and a higher number of recorded events were prioritised, as these facilitate the acquisition of more comprehensive spatial information pertaining to the plasma. Consequently, the optimum DVS acquisition parameters were determined as follows: a lens-to-sample distance of 5 cm, an acquisition angle (between the optical axis and the sample surface) of 0°, and an aperture value of F2.0. Under these optimised conditions, high-quality plasma images were obtained.
| Carbon steel | Copper alloy | ||||||
|---|---|---|---|---|---|---|---|
| No. | Fe | Mn | Si | No. | Cu | Zn | Fe |
| Q1 | 99.54 | 0.25 | 0.15 | H1 | 58.80 | 37.34 | 0.22 |
| Q2 | 99.21 | 0.60 | 0.13 | H2 | 62.10 | 37.82 | 0.07 |
| Q3 | 98.93 | 0.79 | 0.17 | H3 | 65.00 | 34.25 | 0.15 |
| Q4 | 99.12 | 0.48 | 0.19 | H4 | 69.83 | 29.81 | 0.04 |
| Q5 | 98.88 | 0.77 | 0.20 | H5 | 75.00 | 24.25 | 0.15 |
| Q6 | 98.77 | 0.86 | 0.28 | H6 | 80.23 | 19.67 | 0.07 |
| Q7 | 98.86 | 0.79 | 0.25 | H7 | 90.23 | 9.57 | 0.07 |
| Q8 | 98.61 | 1.03 | 0.25 | H8 | 96.32 | 3.60 | 0.07 |
| Q9 | 98.65 | 0.93 | 0.31 | — | — | — | — |
Each grade of carbon steel and copper alloys has distinct physical properties and applications. For simplicity, the nine types of carbon steel are referred to as Q1 through Q9, and the eight copper alloys as H1 through H8, sequentially. All samples had dimensions of 30 mm × 30 mm × 5 mm. To minimise surface contamination and improve spectral quality, the samples underwent a standardised cleaning process. First, they were polished, then placed in anhydrous ethanol and ultrasonically cleaned for 10 minutes. Afterward, the samples were rinsed thoroughly with distilled water, and their surface moisture was wiped off. Finally, they were placed in a vacuum drying oven at 60 °C for 6 hours. Once cooled to room temperature, the samples were prepared for testing.
Each sample was measured 150 times. To simulate the challenges LIBS faces in complex detection scenarios, 90 data sets were collected using a laser energy of 95 mJ. The remaining 60 data sets were obtained using laser energies of 85 mJ and 105 mJ, with 30 data sets collected at each energy level to simulate significant spectral fluctuations caused by environmental complexity.
Spectral line attribution was conducted by comparing the experimental data with standard spectral parameters from the Atomic Spectra Database,35 considering factors such as wavelength, transition probability, relative intensity, and the actual elemental composition of the sample. In cases where spectral lines were difficult to identify or overlapped, the kt value was used to assist in differentiation.36
DVS is a novel vision sensor inspired by biological vision. Unlike traditional cameras, it outputs an asynchronous stream of events. The DVS pixel array used in this study has a resolution of 346 × 260 pixels, with each pixel operating independently and asynchronously. When the change in the logarithmic brightness at any pixel exceeds a set threshold, an event is generated containing the pixel coordinates (x, y), timestamp (t), and polarity (p). When an increase in brightness is detected, an ON event with p = 1 is generated; conversely, a decrease in brightness results in an OFF event with p = 0.37 This working mechanism enables the DVS to focus only on regions where brightness changes occur, allowing it to operate more effectively in complex environments such as strong light, backlight, and low light conditions. In addition, compared to traditional cameras, DVS offers advantages such as high temporal resolution, low redundancy, small data volume, fast processing speed, cost-effectiveness, and compact size. Since the DVS does not produce images directly, this study employs the event frame method to reconstruct plasma images from the event stream data,38 thereby obtaining plasma optical information. By analysing the curve of the number of events over time, the initial moment t0 of plasma generation is determined. Then, using a 200 ms time window, all OFF events from t0 to t0 + 200 ms are extracted and accumulated to reconstruct the event frame image.
In the single-pulse laser experiment, the event data stream corresponding to the plasma optical signals is represented in spatiotemporal form, as shown in Fig. 3(a). Black dots in the figure represent OFF events, and white dots represent ON events. The experimental data were processed using the event frame method with a fixed time window, by accumulating OFF event data from a series of experiments to reconstruct plasma grayscale images, as shown in Fig. 3(b). The clear shape and contour features of the plasma in the image enable its use for subsequent analysis.
![]() | ||
| Fig. 3 DVS data processing: (a) spatiotemporal representation of event streams and (b) plasma images reconstructed from event data. | ||
BiLSTM39 is based on LSTM40 and it addresses the limitation of LSTM only utilizing unidirectional information by employing a bidirectional structure. As shown in Fig. 5(a), the BiLSTM consists of two parallel LSTM layers, forward and backward, which can process the sequence from low to high wavelength and from high to low wavelength, respectively. In this study, a BiLSTM branch is introduced in parallel following the convolutional modules. Local features are initially extracted by the CNN, after which a portion of the feature sequences is further processed by the BiLSTM to capture global dependencies. By integrating both forward and backward information flows, BiLSTM can comprehensively capture and integrate complex relationships at both ends of the spectral sequence as well as amongst different spectral bands, thereby significantly enhancing the model's capacity to recognise and represent complex spectral patterns.
![]() | ||
| Fig. 5 Details of the classification model structure and classification process: (a) BiLSTM structure, (b) multi head attention mechanism, and (c) classification flowchart. | ||
Feature extraction of event-reconstructed plasma images employs a 2D-CNN structure, as shown in Fig. 4. The input is a single-channel image of 100 × 100 pixels, which is sequentially processed by four convolutional modules to efficiently extract hierarchical spatial features. Each module consists of a convolutional layer, a batch normalisation layer, a ReLU activation layer, and a 2 × 2 max pooling layer, enabling multi-level feature abstraction. The numbers of convolutional kernels in the four convolutional layers are 128, 256, 512, and 512, respectively, each with a kernel size of 3 × 3. After the convolutional modules, a global average pooling layer aggregates the spatial information across the feature maps, further reducing the feature dimensions. This is followed by a fully connected layer, ultimately producing a compact 256-dimensional image feature vector that serves as a high-level representation for subsequent tasks.
| Qi = WQ·xi | (1) |
| Ki = WK·xi | (2) |
| Vi = WV·xi | (3) |
![]() | (4) |
![]() | (5) |
The MHA is an extension of the self-attention mechanism, as illustrated in Fig. 5(b). It partitions the feature space into multiple independent heads, each employing distinct projection matrices to perform self-attention computations. This design enables the model to capture relationships between features from multiple perspectives. In this study, the number of attention heads is set to 32. The outputs Zi from all heads are concatenated and subsequently projected to the target feature space via a projection matrix Wo, yielding the fused spectral and plasma image feature representation Z.
The flowchart of the classification method proposed in this study is illustrated in Fig. 5(c). Initially, spectral information from both plasma and event stream data is processed by applying denoising and baseline correction to enhance spectral quality, whilst plasma images are reconstructed from event frames. The pre-processed data are subsequently input into TSFA Net, which utilises a 1D-CNN-BiLSTM module to extract spectral features and a 2D-CNN module to extract features from plasma images. The MHA is then employed to achieve multimodal feature fusion. Finally, classification is performed using a fully connected layer, yielding the final classification results.
![]() | (6) |
| Precisioni = TPi/(TPi + FPi) | (7) |
| Recalli = TPi/(TPi + FNi) | (8) |
| F1i = 2 × Precisioni × Recalli/(Precisioni + Recalli) | (9) |
![]() | ||
| Fig. 6 The effects of laser energy fluctuations: (a) effect on the intensity of characteristic spectral lines and (b) effect on plasma-related features. | ||
In this study, not only was the classification performance of the proposed model investigated, but comparisons were also made with traditional models such as random forest (RF)42 and support vector machine (SVM),43 as well as with several ablation models based on the proposed method, including 1D-CNN, 1D-CNN-BiLSTM, 2D-CNN, two stream convolutional neural network (TSCNN), TSCNN with MHA, and TSCNN with BiLSTM.
The inputs for the RF and SVM models comprised the original characteristic spectral line intensities, intensities normalised using the internal standard method, and both original and internally normalised spectra after dimensionality reduction via PCA.44 In the selection of characteristic spectral lines, priority was given to representative lines of the principal elements contained in the samples. For carbon steel samples, spectral lines of Fe, C, and Mn were mainly selected, with a greater number of Fe lines included due to its high content and the abundance of its spectral lines; detailed information is provided in Table 2. For copper alloy samples, additional lines corresponding to Cu, Zn, Fe, and Pb were incorporated based on the original selection. Ultimately, 27 characteristic Fe spectral lines were chosen for carbon steel samples, while 22 lines relating to the main elements were selected for copper alloy samples. Selection of the internal standard lines took into account signal intensity, stability, and susceptibility to interference. For carbon steel, Fe I 438.354 nm was chosen as the internal standard line because of its high intensity, good signal-to-noise ratio, and minimal interference from other lines, thus enabling effective normalisation. For copper alloy samples, Cu I 521.820 nm was employed as the internal standard, exhibiting high intensity and stability, and strong resistance to spectral interference.
| Sample | Element | Spectral line (nm) |
|---|---|---|
| Carbon steel | Fe | 393.263, 393.360, 396.114, 396.742, 404.454, 404.581, 406.244, 406.359, 418.703, 438.277, 438.354, 440.475, 495.760, 513.946, 516.749, 522.686, 523.294, 526.954, 532.804 |
| C | 493.201, 502.385, 526.896, 530.235 | |
| Mn | 403.076, 403.307, 476.586, 482.352 | |
| Copper alloy | Cu | 327.396, 427.511, 450.937, 465.112, 510.554, 515.324, 521.820, 578.213 |
| Zn | 328.233, 330.258, 334.501, 334.557, 472.216, 481.053 | |
| Fe | 324.820, 393.360, 396.796, 405.821, 428.815, 456.651, 480.994 | |
| Pb | 405.780 |
PCA was applied to the original spectral data, and principal components with a cumulative contribution rate exceeding 85% were selected as input features. For the carbon steel samples, 211 principal components were extracted, whilst 87 principal components were obtained for the brass samples. The PCA results for carbon steel and copper alloys are shown in Fig. 7. As shown in Fig. 7(a), the scatter plot displays the first two principal components of carbon steel, with the first principal component accounting for 33.13% of the variance and the second accounting for 6.78%. Data points from categories Q1 to Q9 exhibit significant overlap in the principal component space defined by PC1 and PC2, with no distinct boundaries observed. This indicates that the first two principal components are insufficient for effectively distinguishing between these categories. Moreover, the samples within each group display a high degree of similarity in variation. As shown in Fig. 7(b), data points for categories H1 and H2 are clustered in the upper region of the PC1–PC2 space and largely overlap. In contrast, the data points for H8 are situated in the left region, exhibiting a high degree of separation from the other categories.
![]() | ||
| Fig. 7 Principal component analysis results: (a) scatter plot of the first two components for carbon steel and (b) copper alloy. | ||
| No. | Internal standard | PCA | RF | SVM | 1D-CNN | 2D-CNN | MHA | BiLSTM | Accuracy | Macro precision | Macro recall | Macro F1 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | ✓ | ✓ | 0.4541 | 0.4420 | 0.4541 | 0.4419 | ||||||
| 2 | ✓ | ✓ | ✓ | 0.5700 | 0.5521 | 0.5700 | 0.5529 | |||||
| 3* | ✓ | 0.6184 | 0.6149 | 0.6184 | 0.6127 | |||||||
| 4* | ✓ | ✓ | 0.6280 | 0.6308 | 0.6280 | 0.6210 | ||||||
| 5 | ✓ | ✓ | 0.6039 | 0.6354 | 0.6039 | 0.6131 | ||||||
| 6 | ✓ | ✓ | ✓ | 0.6232 | 0.6343 | 0.6232 | 0.6230 | |||||
| 7* | ✓ | 0.6377 | 0.6389 | 0.6377 | 0.6375 | |||||||
| 8* | ✓ | ✓ | 0.6377 | 0.6391 | 0.6377 | 0.6307 | ||||||
| 9 | ✓ | 0.3575 | 0.3880 | 0.3575 | 0.3452 | |||||||
| 10 | ✓ | ✓ | 0.3913 | 0.3900 | 0.3913 | 0.3758 | ||||||
| 11 | ✓ | 0.8889 | 0.8937 | 0.8889 | 0.8895 | |||||||
| 12 | ✓ | ✓ | 0.8599 | 0.8628 | 0.8599 | 0.8598 | ||||||
| 13 | ✓ | ✓ | ✓ | 0.8841 | 0.8857 | 0.8841 | 0.8844 | |||||
| 14 | ✓ | ✓ | ✓ | 0.9034 | 0.9063 | 0.9034 | 0.9036 | |||||
| 15 | ✓ | ✓ | ✓ | ✓ | 0.9324 | 0.9345 | 0.9324 | 0.9327 |
Specifically, No. 1 and No. 2 correspond to the use of spectra subjected to PCA dimensionality reduction before and after processing with the internal standard method, respectively, as inputs to the RF model. No. 3* and No. 4* denote the use of the original characteristic spectral line intensities and those normalised by the internal standard method, respectively, as RF inputs. Similarly, No. 5 and No. 6 represent spectra processed with PCA dimensionality reduction before and after internal standard method treatment, respectively, as inputs to the SVM model, while No. 7* and No. 8* denote the use of the original and internal standard method-normalised characteristic spectral line intensities, respectively, as SVM inputs. Table 3 shows that, when using only spectral data for classification, the performance of the SVM model is significantly better than that of RF, with 1D-CNN and 1D-CNN-BiLSTM performing the worst. The combination of PCA features with the SVM model achieves a classification accuracy of 0.6039, which is noticeably higher than the 0.4541 accuracy obtained by combining PCA with the RF model. When the original spectra are processed using the internal standard method prior to PCA dimensionality reduction, a slight improvement in classification performance is observed; for the SVM model, the accuracy increases to 0.6232. Using characteristic spectral line intensities as input produces markedly better classification results than employing PCA. This suggests that, for carbon steel samples where the differences between spectra are relatively subtle, the manually selected characteristic lines of the principal elements provide better performance than features derived from PCA. However, for models using characteristic spectral line intensities as input, the internal standard method has only a marginal impact on performance. In addition, both the 1D-CNN and 1D-CNN–BiLSTM models achieve classification accuracies below 0.4 on the test set, with BiLSTM providing only a minor improvement over the stand-alone 1D-CNN. Although both 1D-CNN and its combination with BiLSTM afford the benefits of automatic feature extraction and temporal modelling, their performance in carbon steel classification does not surpass that of conventional machine learning models.
When using a 2D-CNN to classify carbon steel based on plasma images reconstructed from event stream data, the classification accuracy reached 0.8889, representing a 39.39% improvement over the highest accuracy achieved using characteristic spectral line intensities as SVM inputs (0.6377). Additionally, the macro precision, macro recall, and macro F1-score were 0.8937, 0.8889, and 0.8895, respectively, all significantly higher than those achieved by models based on spectral data. Fundamentally, differences in metal properties lead to laser-induced plasmas with distinct spectral features, making classification based on plasma images more straightforward. After horizontally concatenating the features extracted by 1D-CNN and 2D-CNN, the classification accuracy for carbon steel was 0.8599, which is lower than the accuracy of 0.8889 achieved when using the 2D-CNN alone. This suggests that simple feature concatenation does not further improve model performance and may instead introduce redundant or noisy information, thereby affecting classification effectiveness. When cross-modal features extracted by the 1D-CNN and 2D-CNN are fused using MHA for carbon steel classification, the classification accuracy reaches 0.8841, with macro precision, macro recall, and macro F1-score of 0.8857, 0.8841, and 0.8844, respectively. Compared with horizontal feature concatenation, the improvement brought about by MHA fusion is quite significant, indicating that the MHA mechanism can more effectively capture correlations between different modalities, achieve better joint feature representations, and thus significantly enhance the model's classification performance.
Comparing the results of No. 14, No. 12, and No. 11, it is evident that introducing the BiLSTM branch into spectral feature extraction significantly improves the accuracy of carbon steel classification. After horizontally concatenating the features extracted by 1D-CNN–BiLSTM and 2D-CNN, the classification accuracy reaches 0.9034, which is clearly superior to the results obtained using 2D-CNN alone. This indicates that the 1D-CNN–BiLSTM architecture can not only effectively extract local temporal patterns but also capture long-range dependencies within spectral data, further enriching feature representation. When this spectral-temporal information is combined with spatial information from plasma images, it enables effective complementary advantages, significantly enhancing the classification performance of carbon steel. The proposed model, TSAF Net, achieves a classification accuracy of 0.9324 on the carbon steel test set, with macro precision, macro recall, and macro F1-score of 0.9345, 0.9324, and 0.9327 respectively, all of which are the highest amongst all models evaluated. This fully demonstrates the model's effectiveness. TSAF Net not only thoroughly integrates multimodal information to mine complementary features between spectral data and plasma images but also exhibits strong discriminative ability and generalisation performance, providing a more accurate and reliable solution for the carbon steel classification task.
The confusion matrices of Models No. 12 to No. 15 (TSCNN, TSCNN-MHA, TSCNN-BiLSTM, and TSAF Net) on the carbon steel test set are shown in Fig. 8, enabling a comparison of each model's classification performance across different carbon steel categories. As shown in Fig. 8(a), for TSCNN, the proportion of misclassified samples for Q3, Q5, Q6, and Q9 each exceeds 17%, with misclassification of Q8 and Q9 being particularly prominent. Some samples from Q5 and Q6 are also incorrectly assigned to adjacent categories. The causes of this phenomenon are multifaceted. Two key factors are the inhomogeneity of the metal sample's surface composition and fluctuations in laser energy. Both factors can impact the characteristics of the generated plasma, thereby influencing the resulting spectral signals and plasma images. These effects increase the likelihood of inter-class confusion and ultimately lead to misclassification. Fig. 8(b) demonstrates that incorporating MHA leads to more balanced classification accuracy across the categories. However, the proportion of misclassified samples for Q8 and Q9 still exceeds 17%. Compared with Fig. 8(a), there is a marked reduction in misclassification for Q5 and Q6. Nevertheless, the confusion between Q8 and Q9 remains pronounced and has not been effectively alleviated. As shown in Fig. 8(c), the introduction of BiLSTM generally enhances the classification performance of TSCNN. Except for Q2, Q4, and Q8, the classification accuracy for the remaining categories increases significantly, particularly for Q5, Q6, and Q9. However, it is worth noting that Q5 is still frequently misclassified as Q4, and Q8 is often misclassified as Q9, suggesting that the similarity between certain categories continues to limit the model's discriminative capability. The results presented in Fig. 8(d) indicate a substantial improvement in the classification performance of TSAF Net. Apart from Q8, the proportion of correctly classified samples in all other categories exceeds 91%, and the misclassification issues associated with Q5 and Q8 are also notably reduced. These findings further demonstrate the superior performance of TSAF Net in the carbon steel classification task.
![]() | ||
| Fig. 8 Confusion matrices of several models for carbon steel classification: (a) TSCNN, (b) TSCNN-MHA, (c) TSCNN-BiLSTM, and (d) TSAF Net. | ||
The classification metrics for each category in the carbon steel test set as achieved by TSAF Net are presented in Fig. 9(a). Overall, TSAF Net attains consistently high values for precision, recall, and F1-score across all categories. Notably, Q1 and Q5 achieve the highest scores in all metrics, indicating that these two categories are more readily and accurately identified. In contrast, the precision values for Q4, Q7, and Q9 are 0.8750, 0.8800, and 0.8750, respectively, suggesting that a considerable number of samples from other categories are misclassified as these. It is also important to note that Q8 exhibits the lowest recall and F1-score, highlighting that missed detections remain relatively frequent in this category. In general, TSAF Net exhibits balanced discriminative capability across all categories and demonstrates strong potential for practical applications.
![]() | ||
| Fig. 9 Classification performance of TSAF Net on carbon steel: (a) metrics for each category and (b) prediction probabilities for all samples. | ||
Additionally, Fig. 9(b) shows the classification prediction probability distribution of TSAF Net for all carbon steel samples. The first 1143 samples constitute the training set, whilst the remaining 207 samples correspond to the test set. As shown in the figure, training set samples are associated with high prediction probabilities for their correct categories, primarily concentrated around 0.7 (red area), whereas probabilities for incorrect categories remain relatively low, typically around 0.2 (blue area). This indicates that TSAF Net demonstrates strong discriminative ability and high classification confidence on the training data. For the test set, however, prediction probabilities for the correct categories are moderately lower, generally clustering around 0.6, whilst probabilities for certain incorrect categories increase, suggesting a slight decline in classification confidence. Nevertheless, despite this modest reduction in performance on the test set, TSAF Net continues to effectively distinguish amongst different categories, demonstrating robust generalisation capability.
| No. | Internal standard | PCA | RF | SVM | 1D-CNN | 2D-CNN | MHA | BiLSTM | Accuracy | Macro precision | Macro recall | Macro F1 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | ✓ | ✓ | 0.6196 | 0.5983 | 0.6196 | 0.6048 | ||||||
| 2 | ✓ | ✓ | ✓ | 0.6033 | 0.5786 | 0.6033 | 0.5874 | |||||
| 3* | ✓ | 0.5163 | 0.5106 | 0.5163 | 0.5114 | |||||||
| 4* | ✓ | ✓ | 0.5435 | 0.5371 | 0.5435 | 0.5372 | ||||||
| 5 | ✓ | ✓ | 0.6739 | 0.6723 | 0.6739 | 0.6718 | ||||||
| 6 | ✓ | ✓ | ✓ | 0.7065 | 0.7085 | 0.7065 | 0.7072 | |||||
| 7* | ✓ | 0.6196 | 0.6162 | 0.6196 | 0.6146 | |||||||
| 8* | ✓ | ✓ | 0.5978 | 0.5889 | 0.5978 | 0.5878 | ||||||
| 9 | ✓ | 0.6250 | 0.5800 | 0.6250 | 0.5814 | |||||||
| 10 | ✓ | ✓ | 0.6522 | 0.6066 | 0.6522 | 0.6147 | ||||||
| 11 | ✓ | 0.8696 | 0.8779 | 0.8696 | 0.8703 | |||||||
| 12 | ✓ | ✓ | 0.9076 | 0.9156 | 0.9076 | 0.9084 | ||||||
| 13 | ✓ | ✓ | ✓ | 0.9185 | 0.9193 | 0.9185 | 0.9186 | |||||
| 14 | ✓ | ✓ | ✓ | 0.9348 | 0.9441 | 0.9348 | 0.9342 | |||||
| 15 | ✓ | ✓ | ✓ | ✓ | 0.9457 | 0.9488 | 0.9457 | 0.9462 |
When using only 2D-CNN to extract features from event-stream reconstructed plasma images for copper alloy classification, the model achieved a classification accuracy of 0.8696, with other evaluation metrics also close to 0.87. Compared to the best-performing model based on spectral features (PCA-SVM), the 2D-CNN model improved Accuracy, macro precision, macro recall, and macro F1 by 23.09%, 23.91%, 23.09%, and 23.06% respectively. These results indicate that event-stream reconstructed plasma images contain more informative and discriminative features, more comprehensively capturing plasma state changes induced by sample differences, and thereby significantly enhancing the model's classification performance.
After horizontally concatenating the features extracted by the 1D-CNN and 2D-CNN for classification, the model's classification accuracy further increased to 0.9076, outperforming the use of either the 1D-CNN or 2D-CNN alone. This result contrasts with the carbon steel classification task, where the model with horizontally concatenated features performed worse than the standalone 2D-CNN on carbon steel samples. By using the MHA mechanism to fuse the cross-modal features extracted by 1D-CNN and 2D-CNN, the model's accuracy in the copper alloy classification task further increases to 0.9185, with all other metrics also showing significant improvement. This result indicates that, compared to simple horizontal feature concatenation, MHA can more effectively explore and leverage the correlations and complementary information between different modal features, thereby enhancing the model's discriminative ability and classification performance. The results for No. 14 demonstrate that incorporating BiLSTM can significantly enhance the model's performance in the copper alloy classification task. Compared to No. 12 and No. 13, the classification accuracy of the TSCNN-BiLSTM model increased to 0.9348, representing a 3.00% improvement over TSCNN, with all other metrics also exceeding 0.93. These findings underscore the important role of BiLSTM in modelling spectral sequence correlations and strengthening feature representation, thereby effectively improving the model's classification performance. TSAF Net attained the highest performance in the copper alloy classification task, achieving an accuracy of 0.9457 for different types of brass. The macro precision, macro recall, and macro F1-score reached 0.9488, 0.9457, and 0.9462, respectively.
The classification confusion matrices of models No. 12 to No. 15 (TSCNN, TSCNN-MHA, TSCNN-BiLSTM, and TSAF Net) on the copper alloy test set are presented in Fig. 10. As shown in Fig. 10(a), TSCNN exhibits the highest number of misclassifications in classes H3 and H4, with H3 frequently misclassified as H4, and H4 particularly prone to being confused with H6. The tendency for misclassification among adjacent classes is primarily due to the deliberate introduction of laser energy fluctuation during the experiments. In addition, the inhomogeneity of the sample surface composition is another important contributing factor. Although these factors introduce additional complexity and may negatively impact classification performance, they better simulate real-world application scenarios, thereby providing a more rigorous and realistic assessment of the model's effectiveness. Fig. 10(b) demonstrates that, after incorporating the MHA mechanism, the model's ability to recognise easily confused classes is significantly improved. Specifically, the number of correctly classified H3 samples increased from 19 to 20, whilst misclassifications as H4 and H5 decreased, and the number of H4 samples misclassified as H6 was reduced from 5 to 4. These results highlight the MHA mechanism's effectiveness in enhancing feature representation and alleviating confusion amongst certain categories. Fig. 10(c) shows that, compared to TSCNN, the TSCNN-BiLSTM model achieves substantial improvements in the correct classification rates of H2, H3, H4, H5, H7, and H8. However, nearly 40% of H6 samples are still misclassified as H4, suggesting that distinguishing between H6 and H4 remains challenging. In Fig. 10(d), TSAF Net achieves the best overall classification performance, delivering more balanced results across all copper alloy categories. The correct classification rates for all classes exceed 87%, with H1, H2, H5, H7, and H8 surpassing 95%. Moreover, the confusion between H4 and H6 is notably alleviated. Overall, TSAF Net demonstrates outstanding performance and robustness in the copper alloy classification task.
![]() | ||
| Fig. 10 Confusion matrices of several models for copper alloy classification: (a) TSCNN, (b) TSCNN-MHA, (c) TSCNN-BiLSTM, and (d) TSAF Net. | ||
As shown in Fig. 11(a), TSAF Net performs well on the copper alloy test set, achieving high precision, recall, and F1-score across all categories. The results for H1, H2, H5, H7, and H8 are especially strong, indicating that these categories are most accurately classified. In comparison, precision for H4 and H6 is slightly lower, at 0.8333 and 0.8400 respectively, suggesting a higher rate of misclassification from other categories. H3, H4, and H6 also have lower recall and F1-score, which means there are more missed detections in these classes. Overall, TSAF Net shows balanced discriminative ability across categories and has strong practical value. Fig. 11(b) shows the predicted classification probabilities for all carbon steel samples. The first 1016 samples are the training set, and the last 184 are the test set. In the training set, samples have high probabilities for correct categories, mostly around 0.7, and low probabilities, about 0.1, for incorrect categories. This indicates that the model makes confident and accurate predictions for the training data. For the test set, predicted probabilities for correct categories are slightly lower, around 0.6, and probabilities for incorrect categories increase somewhat, showing a small drop in classification confidence. Still, TSAF Net effectively distinguishes between copper alloy categories and demonstrates strong generalisation ability.
![]() | ||
| Fig. 11 Classification performance of TSAF Net on copper alloy: (a) metrics for each category and (b) prediction probabilities for all samples. | ||
| No. | SVM | 1D-CNN | 2D-CNN | MHA | BiLSTM | Runtime (s) | |
|---|---|---|---|---|---|---|---|
| Carbon steel | Copper alloy | ||||||
| 1 | ✓ | 85 | 72 | ||||
| 2 | ✓ | 115 | 160 | ||||
| 3 | ✓ | ✓ | 194 | 195 | |||
| 4 | ✓ | 123 | 164 | ||||
| 5 | ✓ | ✓ | 160 | 170 | |||
| 6 | ✓ | ✓ | ✓ | 162 | 181 | ||
| 7 | ✓ | ✓ | ✓ | 213 | 198 | ||
| 8 | ✓ | ✓ | ✓ | ✓ | 208 | 204 | |
For the carbon steel classification task, the best-performing traditional model achieves a runtime of approximately 85 s, whereas the TSAF Net model requires around 208 s, representing an increase of roughly 2.4 times. Ablation experiments further indicate that, following the integration of multiple modules, the overall runtime of TSAF Net increases by less than 90 s compared to its single-module counterparts, while achieving a substantial improvement in classification performance. In the copper alloy classification task, the best-performing traditional model requires approximately 72 s for inference, compared with about 208 s for TSAF Net, which is approximately 2.8 times longer. Ablation studies likewise reveal that, upon integration of multiple modules, the additional runtime incurred by TSAF Net remains within 50 s, whilst its classification performance advantage persists. Moreover, the TSAF Net model size is approximately 50 MB, rendering it well-suited for rapid deployment in practical production and inspection settings.
Overall, although TSAF Net demands greater runtime and storage space than traditional models, its total runtime consistently remains within 4 min, and the model size is well within acceptable parameters for practical application. Therefore, TSAF Net fully satisfies the requirements for rapid analysis and inspection in industrial contexts.
The TSAF Net demonstrates outstanding performance in the classification of various categories of carbon steels and copper alloys, achieving accuracies of 93.24% and 94.57%, respectively. Comparative experiments reveal that when only spectral data are employed, the optimal model attains classification accuracies below 71% for both carbon steel and copper alloy samples. This limitation is primarily due to factors including surface compositional inhomogeneity, matrix effects, and fluctuations in laser energy, which restrict the ability of the spectra to capture inter-sample distinctions. The internal standard method yields only modest improvements, with its impact remaining limited. In contrast, utilising plasma images reconstructed from event streams increases classification accuracies for both metals to above 86%. Although both spectra and event stream data are derived from plasma, event-based image reconstruction provides a more intuitive representation of the plasma characteristics associated with different samples.
TSAF Net comprises a parallel 1D-CNN and BiLSTM architecture to extract both local and global dependencies from spectral sequences. The incorporation of a multi-head attention mechanism enables effective fusion of spectral and image-based cross-modal data, thereby enhancing the representational capacity of the combined features. As a result, TSAF Net capitalises on the complementary strengths of spectral and image data, leading to notable gains in classification accuracy and robustness. The model also provides practical advantages, including low computational overhead and a compact size of approximately 50 MB, which facilitates rapid deployment in industrial settings. This approach shows significant potential for applications such as material property characterisation, ageing monitoring, and metal recycling. Future research will focus on advancing these applications to further support the development and practical implementation of LIBS technology.
| This journal is © The Royal Society of Chemistry 2025 |