Attention-based multimodal fusion of event-reconstructed images and LIBS spectra using CNN and BiLSTM for metal classification
Abstract
Laser-induced breakdown spectroscopy (LIBS) has been widely employed for the detection and analysis of metal materials. However, most current methods that primarily combine dimensionality reduction with machine learning still demonstrate limited discriminative power when distinguishing between metals with similar compositions. To improve the analytical accuracy of LIBS, this study introduces a dynamic vision sensor (DVS) into the LIBS system to capture the optical emissions from plasma and reconstruct plasma images using an event frame method. By fusing spectral data and plasma images, we propose a metal classification model based on a temporal spatial attention fusion network (TSAF Net). TSAF Net employs a combination of 1D-convolutional neural network (1D-CNN) and bidirectional long short-term memory network (BiLSTM) architectures for spectral feature extraction, a 2D-CNN for image feature extraction, and incorporates a multi-head attention mechanism for deep cross-modal feature fusion. A fully connected layer then completes the final metal classification task. To better simulate on-site challenges, the experimental setup introduces disturbances such as laser energy fluctuations. The proposed TSAF Net achieves classification accuracies of 93.24% for carbon steel and 94.57% for copper alloys, along with outstanding macro precision, recall, and F1 scores. Compared with the best-performing conventional methods, TSAF Net increases classification accuracy by 46.21% for carbon steel and 33.86% for copper alloys. Additionally, TSAF Net exhibits high computational efficiency and maintains a compact model size. This study significantly improves the accuracy of LIBS in the identification of metallic materials and provides new insights for the further development and application of LIBS.