Attention-based multimodal fusion of event-reconstructed images and LIBS spectra using CNN and BiLSTM for metal classification

Honglin Jian; Lei Deng; Jun Wang; Zikui Shen; Xilin Wang; Zhidong Jia

doi:10.1039/D5JA00238A

Attention-based multimodal fusion of event-reconstructed images and LIBS spectra using CNN and BiLSTM for metal classification

Honglin Jian,^a Lei Deng,^b Jun Wang,^c Zikui Shen,^d Xilin Wang

*^a and Zhidong Jia^a

Author affiliations

* Corresponding authors

^a Engineering Laboratory of Power Equipment Reliability in Complicated Coastal Environments, Tsinghua University, Shenzhen, Guangdong, PR China
E-mail: jianhl23@mails.tsinghua.edu.cn, wang.xilin@sz.tsinghua.edu.cn

^b Department of Precision Instrument, Centre for Brain Inspired Computing Research, Tsinghua University, Beijing, PR China

^c State Key Laboratory of Environmental Adaptability for Industrial Products, China National Electric Apparatus Research Institute Co., Ltd, Guangzhou, Guangdong, PR China

^d School of Electric Power Engineering, South China University of Technology, Guangzhou, Guangdong, PR China

Abstract

Laser-induced breakdown spectroscopy (LIBS) has been widely employed for the detection and analysis of metal materials. However, most current methods that primarily combine dimensionality reduction with machine learning still demonstrate limited discriminative power when distinguishing between metals with similar compositions. To improve the analytical accuracy of LIBS, this study introduces a dynamic vision sensor (DVS) into the LIBS system to capture the optical emissions from plasma and reconstruct plasma images using an event frame method. By fusing spectral data and plasma images, we propose a metal classification model based on a temporal spatial attention fusion network (TSAF Net). TSAF Net employs a combination of 1D-convolutional neural network (1D-CNN) and bidirectional long short-term memory network (BiLSTM) architectures for spectral feature extraction, a 2D-CNN for image feature extraction, and incorporates a multi-head attention mechanism for deep cross-modal feature fusion. A fully connected layer then completes the final metal classification task. To better simulate on-site challenges, the experimental setup introduces disturbances such as laser energy fluctuations. The proposed TSAF Net achieves classification accuracies of 93.24% for carbon steel and 94.57% for copper alloys, along with outstanding macro precision, recall, and F1 scores. Compared with the best-performing conventional methods, TSAF Net increases classification accuracy by 46.21% for carbon steel and 33.86% for copper alloys. Additionally, TSAF Net exhibits high computational efficiency and maintains a compact model size. This study significantly improves the accuracy of LIBS in the identification of metallic materials and provides new insights for the further development and application of LIBS.

Journal of Analytical Atomic Spectrometry

Attention-based multimodal fusion of event-reconstructed images and LIBS spectra using CNN and BiLSTM for metal classification

Abstract

Article information

Download Citation

Permissions

Attention-based multimodal fusion of event-reconstructed images and LIBS spectra using CNN and BiLSTM for metal classification

Social activity

Search articles by author

Spotlight

Advertisements