Zhicheng
Cui
abc,
Luning
Li
*b,
Rong
Shu
abc,
Fan
Yang
b,
Yuwei
Chen
ad,
Xuesen
Xu
a,
Jianyu
Wang
abc,
Agnès
Cousin
e,
Olivier
Forni
e and
Weiming
Xu
*abc
aCollege of Physics and Optoelectronic Engineering, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, P. R. China. E-mail: xuwm@mail.sitp.ac.cn
bKey Laboratory of Space Active Opto-electronics Technology, Shanghai Institute of Technical Physics, Chinese Academy of Sciences, Shanghai 200083, P. R. China. E-mail: liluning@mail.sitp.ac.cn
cUniversity of Chinese Academy of Sciences, Beijing 100049, P. R. China
dAdvanced Laser Technology Laboratory of Anhui Province, Hefei 230037, P. R. China
eInstitut de Recherche en Astrophysique et Planétologie (IRAP), Université de Toulouse, UPS, CNRS, Toulouse 31400, France
First published on 27th May 2025
As an in situ and stand-off detection technique, Laser-Induced Breakdown Spectroscopy (LIBS) has been successfully applied in Mars exploration, by installing the LIBS instrument on three Mars rover payloads, i.e. ChemCam, SuperCam and MarSCoDe. Effective analysis of the Mars in situ data requires high-performance LIBS chemometrics. Deep learning methods like convolutional neural networks (CNNs) have been demonstrated to be powerful as LIBS chemometrics, but they need many labeled samples for model training. Since Mars exploration is a typical scenario where labeled samples are scarce, a natural idea is to take advantage of a large laboratory database. However, the profile discrepancies between Mars in situ spectra and laboratory spectra can be a prominent challenge for the joint use of the two-source data. To address this issue, conventional solutions focus on formulating data conversion strategies to make the different-source data more similar. Such a methodology can certainly yield positive effects, but it requires quite a few common samples and laborious design, and the data similarity would anyhow be limited. In order to employ deep learning for LIBS analysis, this study proposes a new scheme by integrating with a transfer learning technique, which focuses on “knowledge transfer” rather than conventional “data transfer”. ChemCam LIBS data quantification is taken as an example to illustrate the effectiveness. Specifically, a deep CNN model was constructed and trained based on 59760 LIBS spectra collected by a ChemCam laboratory duplicate, then this pretrained CNN model carried out transfer learning, i.e. the model parameters underwent a fine retraining based on 175 in situ spectra acquired from ChemCam calibration targets on Mars, and finally the CNN model after transfer learning was employed to analyze 575 other ChemCam in situ spectra from 3 Martian natural targets as a generalization performance testing. The results from the proposed transfer learning integrated scheme have been compared with the results from four alternative schemes merely including deep learning, as well as with the results from an exquisite scheme developed by the ChemCam team. Regarding the overall quantification accuracy, our scheme can noticeably surpass the four alternative deep learning schemes and show approximately equal performance to the ChemCam scheme. The results indicate that transfer learning is a promising booster for deep learning methods to accurately and efficiently analyze Mars in situ data collected from SuperCam, MarSCoDe, and future payloads.
In particular, the remote detection capability of LIBS has made it a highly favored tool for planetary exploration. As a matter of fact, the LIBS equipment has played a brilliant role in the past few Mars exploration missions, installed on scientific payloads like NASA's ChemCam (Curiosity rover)10,11 and SuperCam (Perseverance rover),12,13 as well as China's MarSCoDe (Zhurong rover in Tianwen-1 mission).14,15 All the three payloads have successfully acquired in situ LIBS data on Mars. The LIBS spectra allow scientists to decipher the geochemical types and/or chemical components of the soil and rocks in the landing areas of the three Mars rovers (as illustrated in Fig. 1). Based on the chemical analysis results, quite a few important scientific discoveries have been made, such as revealing soil diversity and hydration in Gale crater16,17 revealing igneous crater floor lithology with weak alteration signatures in Jezero crater,18–20 and revealing wind regime shift in line with the end of ice age and modern hydroclimatic conditions in Utopia Planitia.21,22
![]() | ||
Fig. 1 A MOLA map of Mars (MOLA DEM37), with the landing sites of the three Mars rovers (Curiosity, Perseverance and Zhurong) marked out. |
To extract the qualitative and quantitative chemical information from the LIBS spectra, one needs not only good hardware that can offer accurate and stable emission line data (wavelength and intensity values) but also appropriate LIBS analysis methods that can formulate high-quality chemometric models. LIBS chemometrics is crucial to analytical performance (especially quantitative analysis) since the relationship between the characteristic line intensities and the corresponding element concentration values may deviate from the theoretical linearity, due to interferential factors like the physical matrix effect,23 chemical matrix effect,24 self-absorption effect,25 and experimental condition effects.26,27 Such nonlinear relationship limits the effectiveness of chemometric methods based on classic linear statistics, such as multivariate linear regression and partial least squares (PLS).28,29
In order to improve the analytical accuracy, LIBS researchers have tried to draw support from various machine learning approaches, such as the random forest,30,31 support vector machine,30,32 least absolute shrinkage and selection operator (LASSO),30,33 elastic net,30,34 back-propagation neural network (BPNN),35,36etc. In particular, in the past few years, some sophisticated deep learning methods have been imported, mainly including the convolutional neural network (CNN),37–41 deep belief network42 and recurrent neural network.43 Although deep learning-based chemometric models like CNN can achieve extraordinary prediction accuracy, it should be noted that their excellent predictions usually require a great number of labeled samples for model training.
Unfortunately, unlike laboratory measurements, in situ LIBS detection on Mars is a typical scenario where labeled samples for training are scarce. Specifically, the labeled samples herein refers to the LIBS spectra of the calibration targets onboard the payload along with the real chemical composition information of those calibration targets. For example, the labeled samples for ChemCam include the LIBS spectra collected from the 10 ChemCam calibration targets (1 titanium alloy, 1 graphite, 4 ceramic samples and 4 glass samples), along with their real composition data (measured using several techniques like X-ray fluorescence, fusion inductively coupled plasma, electronic microprobe, and laser ablation inductively coupled plasma mass spectrometry in pre-flight laboratory experiments44,45). It is worth emphasizing that the scarcity of labeled samples essentially refers to the small quantity of the LIBS calibration targets onboard the payload, irrelevant to the total number of spectra. Even if thousands of spectra are collected on each single calibration target, the diversity of the concentration values of each chemical component would still be quite limited (e.g. ≤10 concentration values per component for ChemCam). Hence, it can be claimed that the labeled sample dataset of ChemCam has a small size, despite that there are thousands of LIBS spectra collected from ChemCam calibration targets on Mars. It is noteworthy that the labeled-sample scarcity issue is ubiquitous in planetary explorations (e.g. in Mars missions, this issue exists for ChemCam, SuperCam and MarSCoDe).
Apparently, the small size of the labeled-sample dataset would hinder the effective training of deep learning models. To the best of our knowledge, there has been no paper reporting the employment of deep learning chemometric model for analyzing in situ Mars LIBS spectra (ChemCam/SuperCam/MarSCoDe). Usually classic chemometrics like PLS are used to deal with the in situ Mars spectra, while deep learning methods only appear in the studies which analyze the spectra collected in laboratory simulating experiments.46,47
Considering the performance superiority of the deep learning CNN method exhibited in our previous investigations,46,48,49 we aim to utilize the CNN to analyze the in situ ChemCam spectra in this study and address the issue of labeled sample scarcity by taking advantage of the large ChemCam laboratory database.50 It is well understood that laboratory LIBS data can be quite different from Mars in situ LIBS data. Therefore, it is necessary to carry out some technical processing before utilizing the laboratory data.
The conventional solutions focus on “data transfer”, aiming to make the LIBS spectral data from different sources more similar via specific data conversion techniques. Several teams have provided great studies based on various data transfer strategies.50–55 Of course, such a methodology can yield positive effects, but it requires quite a few common samples as the data transfer benchmark, and designing the transfer strategy can be laborious and time-consuming, and the spectral data similarity would anyhow be limited.
Unlike the conventional solutions, this work adopts a new scheme by introducing a transfer learning technique, as illustrated in Fig. 2(a). Transfer learning is a technical term in the machine learning field, which is nowadays a popular technique for treating the so-called small-sample-size problems.56 The fundamental idea of transfer learning is to mine the latent knowledge in an original domain and transfer it to a similar but different new domain, with the original domain and the new domain termed as “source domain” and “target domain”, respectively. There are four common transfer learning patterns, namely instance-based, feature-based, relation-based, and pretrained-model-based patterns. Fig. 2(b) demonstrates the principle of transfer learning by taking the pretrained-model-based pattern as an example. It is worth noting that, regardless of which pattern is used, the transfer learning technique focuses on “knowledge transfer” rather than “data transfer”. Therefore, the transfer learning-based scheme proposed in this study is clearly distinct from those that work focusing on “data transfer” mentioned above.
As a matter of fact, transfer learning has been employed in a few previous LIBS studies. In 2018, Yang et al. reported an LIBS analysis study in steel metallurgy for quantification of the Cr concentration, which pioneeringly adopted transfer learning in the LIBS field.57 Since the LIBS spectra of standard samples at high temperatures are considerably difficult to acquire, they employed the LIBS spectra obtained at room temperature as the source domain dataset and utilized the feature-based transfer learning method to analyze the target domain dataset, i.e. the high-temperature spectra. After that, several LIBS studies applying transfer learning have been reported, such as the quantification of metal elements in aluminum alloys,58 the identification of rock lithology,59 the quantification of main components in coal,60,61 and the discrimination of crop production areas.62 Specifically, for the Mars LIBS spectrum oriented work, hitherto there has been only one paper involving transfer learning. Sun et al. adopted two transfer learning patterns (instance-based and feature-based) to realize efficient rock classification by focusing on correcting the physical matrix effect.63 Note that their dataset was the LIBS spectra collected from 20 self-prepared terrestrial rock samples using a regular commercial instrument.
This study, for the first time, employs a scheme combining deep learning with transfer learning to analyze in situ Mars LIBS spectra (recalling that the term “transfer learning” herein specifically refers to the machine learning technique that focuses on “knowledge transfer” instead of “data transfer”). By utilizing transfer learning, one only needs to concentrate on the LIBS chemometric models, and elaborate data conversion between the different-source LIBS spectra is unnecessary, making the data processing more effortless.
The ChemCam dataset is taken as the demonstrative example. To be more specific, we have used a pretrained-model-based transfer learning (PMTL) method to quantify the concentrations of eight oxides commonly existing on the Martian surface, with the pretrained model being a deep learning CNN model. This deep learning combining transfer learning scheme can show accuracy superiority over four alternative schemes in which only deep learning is included, and the accuracy of the proposed scheme can generally be as high as that of an exquisite scheme designed by the ChemCam team based on classic chemometrics.
The subsequent text is arranged as follows: section 2 elucidates the LIBS detection samples, instruments and spectral datasets; section 3 describes the deep learning CNN model and the designed transfer learning scheme; section 4 exhibits the analytical results of our scheme and the performance comparison with other schemes; section 5 provides further discussion about some noteworthy points and some future prospects; and section 6 gives a conclusion.
The source domain samples are chosen from the 408 geochemical standard samples (pressed pellets) prepared in several laboratories of the ChemCam team, with the chemical composition data of each standard sample clearly known.50 As mentioned before, this study aims to quantitatively analyze the concentrations of eight common oxides that exist in the Martian surface substances. They are SiO2, TiO2, Al2O3, FeOT, MgO, CaO, Na2O, and K2O. Some of the 408 standard samples do not contain all the eight oxides, hence those samples are excluded from this research, and finally 332 standard samples are selected as our source domain samples.
A violin-plot diagram of the concentration distribution of the eight oxides in the 332 source domain samples is shown in Fig. 3. In addition, Table 1 displays the statistics of the oxide concentrations, including the maximum, minimum, mean and median values.
SiO2 | TiO2 | Al2O3 | FeOT | MgO | CaO | Na2O | K2O | |
---|---|---|---|---|---|---|---|---|
Maximum | 97.71 | 5.81 | 31.99 | 65.85 | 29.23 | 37.22 | 25.96 | 12.05 |
Minimum | 0.58 | 0.01 | 0.09 | 0.06 | 0.01 | 0.01 | 0.01 | 0.01 |
Mean | 56.62 | 0.91 | 15.24 | 7.31 | 3.62 | 4.78 | 2.26 | 2.48 |
Median | 58.42 | 0.72 | 15.81 | 6.08 | 2.79 | 1.28 | 2.05 | 2.34 |
As to the target domain samples, they include two parts: (i) ChemCam calibration targets (for in situ calibration on Mars) and (ii) natural targets detected by ChemCam such as Martian soils and rocks. For the ChemCam calibration targets, we have selected 4 glass samples and 4 ceramic samples, i.e. (1) Macusanite, (2) Norite, (3) Picrite, (4) Shergottite, (5) KGa-2med-S, (6) NAu-2lo-S, (7) NAu-2med-S, and (8) NAu-2hi-S. The real composition information about these 8 ChemCam calibration targets can be found in the ESI of ref. 50 (material ID: 1-s2.0-S0584854716303913-mmc8). The detailed concentrations of the eight oxide components are displayed in Table 2. It is noteworthy that among the aforementioned 332 source domain samples, there exist 8 duplicate samples of the ChemCam calibration targets, and the composition of each duplicate sample is identical to that of its counterpart in the 8 real ChemCam calibration targets herein.
SiO2 | TiO2 | Al2O3 | FeOT | MgO | CaO | Na2O | K2O | |
---|---|---|---|---|---|---|---|---|
Macusanite | 73.745 | 0.035 | 16.35 | 0.52 | 0.0125 | 0.235 | 4.065 | 3.99 |
Norite | 47.88 | 0.7 | 14.66 | 15.7 | 9.62 | 12.77 | 1.53 | 0.06 |
Picrite | 43.59 | 0.44 | 12.39 | 20.49 | 11.17 | 8.95 | 3.07 | 0.1 |
Shergottite | 48.42 | 0.43 | 10.83 | 17.46 | 6.39 | 14.29 | 1.57 | 0.11 |
KGa-2med-S | 35.64 | 1.47 | 23.71 | 2.86 | 1.68 | 11.46 | 0.72 | 0.26 |
NAu-2lo-S | 43.78 | 0.78 | 7.63 | 18.28 | 2.97 | 8.26 | 1.44 | 0.4 |
NAu-2med-S | 37.48 | 0.57 | 5.72 | 17.05 | 2.05 | 12.27 | 1.11 | 0.29 |
NAu-2hi-S | 30.9 | 0.39 | 3.69 | 15.76 | 1.14 | 16.28 | 0.67 | 0.157 |
With regard to the natural targets detected by ChemCam on Mars, it is self-evident that the real composition information is not accurately known. However, the ChemCam team has offered the oxide concentration values calculated by their well-designed LIBS chemometric models. Besides, one can also find the oxide concentration values measured using an Alpha Particle X-ray Spectrometer (APXS), which is another scientific payload on Curiosity rover.64 In this study, we have selected 3 Martian natural targets as part of the target domain samples, namely Portage (soil), Flaherty_2 (rock) and Gillespie_Lake_1 (rock), and employed the concentration values measured by the APXS as reference “real values” (not strictly real values). Hence, there are totally 11 target domain samples in the current research.
In the laboratory measurement, the LANL ChemCam laboratory testbed is in the normal atmospheric environment (the body unit at ordinary temperature and mast unit at 4 °C), while the source domain samples are placed in a vacuum chamber filled with a Mars-like atmosphere (933 Pa CO2).50 Each sample is probed at five separate locations, and each location is probed by 50 laser shots at 1.6 m distance. Note that for the target domain samples, the detection distances of the 8 ChemCam calibration targets are also 1.6 m, while those of the 3 Martian natural targets vary from 2.4 to 2.7 m. Moreover, when detecting the natural targets, the probing mode is the same as that mentioned above (i.e. five locations and 50 shots per location).
For either the ChemCam flight model or the ChemCam testbed, the LIBS spectrometer has three spectral channels, i.e. an ultraviolet channel (UV, 240.8–340.8 nm), a violet channel (VIO, 382.1–469.1 nm), and a visible and near-infrared channel (VNIR, 473.2–905.6 nm). Each LIBS spectrum has 6144 pixel data points.
It is worth emphasizing that although the ChemCam testbed and the ChemCam payload have almost identical instrument specifications, and the detection environments of the source domain samples and the target domain samples are similar, the spectra of the samples from the two domains are different, even if the samples have identical compositions. For example, the typical spectrum of the Norite sample (onboard the ChemCam payload, target domain sample) and that of the duplicate Norite sample (in the laboratory, source domain sample) have apparent discrepancies, even after intensity normalization, as shown in Fig. 4.
Herein, we have carried out one more preprocessing step upon the CCS data. Specifically, some pixel data points in certain high-noise spectral regions are masked out, i.e. the 240.811–246.635, 338.457–340.797, 382.138–387.859, 473.184–492.427, and 849–905.574 nm regions, just like the operation described in ref. 50. After the masking, there remain 5484 pixel data points in each LIBS spectrum. Hence, for either the source domain or the target domain in this study, the standard data format of each spectrum is a 5484 × 1 matrix.
For each of the 332 source domain samples, we have randomly picked out 180 spectra from its 225 spectra (the first 5 spectra at each sampling location are excluded due to the possible dust contamination on the surface). Hence, there are totally 59760 LIBS spectra in the source domain dataset, and these spectra would be used to construct and train the source domain CNN model, i.e. the so-called pretrained model.
As to the target domain samples, we randomly select 25 spectra from each of the 8 ChemCam calibration targets (in situ calibration on Sol 27 and 76) and adopt 225, 125 and 225 spectra from the 3 Martian natural targets (in situ detection on Sol 90, 130 and 133, the first 5 spectra at each sampling location are excluded as explained above). For the 3 Martian natural targets, Flaherty_2 can only contribute 125 LIBS spectra because it has five sampling positions, while either Portage or Gillespie_Lake_1 can contribute 225 spectra since each has nine sampling positions. It is noteworthy that the in situ spectra signals of the Macusanite sample are so weak that the ChemCam team has abandoned its spectra in practical analysis.50 Therefore, in this study, we have decided to exclude the Macusanite sample too, and actually in situ spectra of 7 ChemCam calibration targets are employed. Hence, there are totally 750 LIBS spectra in the target domain dataset, and the relevant information about the target domain samples is displayed in Table 3, with the original data available at [https://pds-geosciences.wustl.edu/msl/msl-m-chemcam-libs-4_5-rdr-v1/mslccm_1xxx/].
Sample | Name | Data file name | No. of spectra |
---|---|---|---|
Cal. Target 2 | Norite | cl5_399890138ccs_f0030530ccam01027p3.csv | 25 |
Cal. Target 3 | Picrite | cl5_399889851ccs_f0030530ccam01027p3.csv | 25 |
Cal. Target 4 | Shergottite | cl5_399889564ccs_f0030530ccam01027p3.csv | 25 |
Cal. Target 6 | KGa-2med-S | cl5_399888959ccs_f0030530ccam01027p3.csv | 25 |
Cal. Target 7 | NAu-2lo-S | cl5_399888672ccs_f0030530ccam01027p3.csv | 25 |
Cal. Target 8 | NAu-2med-S | cl5_399888385ccs_f0030530ccam01027p3.csv | 25 |
Cal. Target 9 | NAu-2hi-S | cl5_399888098ccs_f0030530ccam01027p3.csv | 25 |
Martian soil | Portage | cl5_405468981ccs_f0050104ccam02089p3.csv | 225 |
cl5_405469061ccs_f0050104ccam02089p3.csv | |||
cl5_405469136ccs_f0050104ccam02089p3.csv | |||
cl5_405469251ccs_f0050104ccam02089p3.csv | |||
cl5_405469326ccs_f0050104ccam02089p3.csv | |||
cl5_405469508ccs_f0050104ccam02089p3.csv | |||
cl5_405469623ccs_f0050104ccam02089p3.csv | |||
cl5_405469699ccs_f0050104ccam02089p3.csv | |||
cl5_405469774ccs_f0050104ccam02089p3.csv | |||
Martian rock I | Flaherty_2 | cl5_409030344ccs_f0051576ccam01130p3.csv | 125 |
cl5_409030421ccs_f0051576ccam01130p3.csv | |||
cl5_409030489ccs_f0051576ccam01130p3.csv | |||
cl5_409030559ccs_f0051576ccam01130p3.csv | |||
cl5_409030628ccs_f0051576ccam01130p3.csv | |||
Martian rock II | Gillespie_Lake_1 | cl5_409283937ccs_f0051662ccam01132p3.csv | 225 |
cl5_409284008ccs_f0051662ccam01132p3.csv | |||
cl5_409284076ccs_f0051662ccam01132p3.csv | |||
cl5_409284188ccs_f0051662ccam01132p3.csv | |||
cl5_409284357ccs_f0051662ccam01132p3.csv | |||
cl5_409284559ccs_f0051662ccam01132p3.csv | |||
cl5_409284670ccs_f0051662ccam01132p3.csv | |||
cl5_409284839ccs_f0051662ccam01132p3.csv | |||
cl5_409284906ccs_f0051662ccam01132p3.csv |
The 175 spectra of the 7 ChemCam calibration targets, which possess strictly real concentration value labels, would be utilized for retraining the source domain CNN model and thereby obtaining the target domain CNN model, while the 575 spectra of the 3 Martian natural targets, which only possess reference “real concentration value” labels (APXS-measured), would be utilized for testing the performance of the target domain CNN model.
The concentration label of each spectrum shows the concentration values of the eight chemical components in the corresponding sample, which can be expressed by a 1 × L vector C, with each value in the vector standing for the concentration c of a certain component, as shown in eqn (1)
Ci = [ci1, ci2, …, ciL], i = 1, 2, …, N | (1) |
Layer 1: Batch normalization layer.
Layer 2: Convolutional layer. The activation function is ReLU (Rectified Linear Unit), which can be expressed by eqn (2)
![]() | (2) |
Layer 3: Pooling layer. The pooling mode is max-pooling.
Layer 4: Convolutional layer. The activation function is ReLU.
Layer 5: Pooling layer. The pooling mode is max-pooling.
Layer 6: Convolutional layer. The activation function is ReLU.
Layer 7: Convolutional layer. The activation function is ReLU.
Layer 8: Pooling layer. The pooling mode is max-pooling.
Layer 9: Convolutional layer. The activation function is ReLU.
Layer 10: Flatten layer.
Layer 11: Dense layer. The activation function is ReLU.
Layer 12: Dropout layer.
Layer 13: Dense layer. The activation function is a sigmoid function, as expressed by eqn (3)
![]() | (3) |
More information about the CNN model hyperparameters is displayed in Table 4. The basic operation mode and the working mechanism of the CNN model can be found in ref. 46 and would not be expounded here.
Layer | Layer hyperparameters | Trainable | |
---|---|---|---|
Batch_Normalization | Input dimension | (5484, 1, 1) | False |
Convolution_1 | Kernel size | (5, 1) | False |
Stride | (2, 1) | ||
Number of filters | 8 | ||
Max_Pooling_1 | Kernel size | (2, 1) | False |
Convolution_2 | Kernel size | (5, 1) | False |
Stride | (2, 1) | ||
Number of filters | 32 | ||
Max_Pooling_2 | Kernel size | (2, 1) | False |
Convolution_3 | Kernel size | (5, 1) | True |
Stride | (2, 1) | ||
Number of filters | 128 | ||
Convolution_4 | Kernel size | (5, 1) | True |
Stride | (2, 1) | ||
Number of filters | 256 | ||
Max_Pooling_3 | Kernel size | (2, 1) | True |
Convolution_5 | Kernel size | (5, 1) | True |
Stride | (2, 1) | ||
Number of filters | 512 | ||
Flatten | — | — | True |
Dense_1 | Input dimension | 11264 | True |
Output dimension | 1024 | ||
Dropout | Dropout rate | 0.2 | True |
Dense_2 | Input dimension | 1024 | True |
Output dimension | 8 |
The training set data (50220 spectra and the corresponding CCV labels) are utilized to train the source domain CNN model, i.e. update the network weight parameters (including biases, the same below) through iterations. The optimizer of the CNN adopts the Adam algorithm, which is able to dynamically adjust the learning rate of every individual weight parameter based on adaptive moment estimation. During the iteration process, the model quantification performance on the whole validation set data (8100 spectra and the corresponding CCV labels) is employed as an efficient feedback guidance for model optimization. The model performance can be evaluated using root mean square error (RMSE), with the lower RMSE representing the better performance. The calculation of the RMSE values will be described in section 3.5.
Based on a certain set of model hyperparameters, the training-validation process can yield one optimized model. When changing one or more hyperparameters, another optimized model would be acquired. The testing set is used to further select the “best” model among these optimized models. To be more specific, we have selected 4 adjustable hyperparameters to be tuned, i.e. batch size, initial learning rate, number of epochs and output epoch interval. And for each hyperparameter, we have set 4 trial values empirically. Hence, there are 44 = 256 possible hyperparameter combination sets, and 256 optimized models can be acquired. Among these 256 optimized models, the model which can achieve the lowest RMSE value for the testing set data (1440 spectra and the corresponding CCV labels) is denoted as the “best model”, and this “best model” is chosen as the final source domain model. This CNN model, i.e. the so-called pretrained model, is also the starting point of the subsequent transfer learning. Information about the dataset partitioning in the training, validation and testing of the source domain CNN model is listed in Table 5.
No. of samples | No. of spectra | Description | |
---|---|---|---|
Training set | 324 | 50220 | • Spectra from the 324 regular laboratory samples, random training/validation partition |
• To train the source domain CNN model | |||
Validation set | 8100 | • Spectra from the 324 regular laboratory samples, random training/validation partition | |
• To optimize the source domain CNN model based on validation result feedback | |||
Testing set | 8 | 1440 | • Spectra from the 8 special laboratory samples, i.e. the duplicate samples of the ChemCam calibration targets |
• To test the prediction performance (generalizability) of the source domain CNN model | |||
Results | Construct a CNN model, complete the training and optimization, and acquire a high-performance source domain pretrained model | ||
Performance evaluation | Compared with the real composition measured using other laboratory analytical techniques |
The method for tuning the pretrained model is to freeze part of the layers in the CNN while retraining the remaining layers. Specifically, there are totally 13 layers in the CNN model herein, and we keep all the weight parameters in the first 5 layers unchanged while retraining the weight parameters in the last 8 layers, as illustrated in Fig. 5. The freezing/retraining strategy information can also be traced in the last column of Table 4 (“False” for freezing and “True” for retraining).
The retraining of the CNN model needs to make use of the target domain data (175 spectra), as stated below. Note that due to the spectral data similarity between the two domains, as well as the excellent performance of the pretrained model on the source domain data, such a retraining would need observably smaller training set data size and less number of iteration steps than conventional from-scratch training.
No. of samples | No. of spectra | Description | |
---|---|---|---|
Training set | 6 | 150 | • Spectra from 6 of the 7 ChemCam calibration targets, “leave-one-out” partition (each sample in turn) |
• To retrain the target domain CNN model | |||
Testing set | 1 | 25 | • Spectra from 1 of the 7 ChemCam calibration targets, “leave-one-out” partition (each sample in turn) |
• To test the prediction performance (generalizability) of the target domain CNN model | |||
Results | Complete the retraining and optimization of the source domain CNN model and acquire a high-performance target domain CNN model | ||
Performance evaluation | Compared with the real composition measured using other laboratory analytical techniques |
As mentioned above, we have tried four alternative schemes, which utilize regular deep learning without transfer learning, to analyze the ChemCam in situ data. The first alternative scheme (called “Scheme AT1”) is to directly employ the source domain CNN model. In other words, there is no retraining process for model tuning. The second alternative scheme (called “Scheme AT2”) is to use the small amount of target domain data to train a CNN model from scratch. That is to say, the large amount of source domain data and the source domain pretrained model are not utilized at all. For Scheme AT2, the training-testing mode is the same as that of the transfer learning scheme, namely the leave-one-out strategy. The third alternative scheme (called “Scheme AT3”) is to train a CNN model using both the source domain and target domain datasets simultaneously. Specifically, in Scheme AT3, the whole dataset comprises the spectra from 331 target samples, including 324 Earth laboratory samples and 7 ChemCam calibration targets (hence realizing the simultaneous utilization of the source domain and target domain data). We adopt the “leave-one-out” strategy for the testing of the 7 ChemCam calibration targets. That is to say, there are 330 training/validation set samples (324 Earth laboratory samples plus 6 ChemCam calibration targets) and one testing set sample, with the testing set sample selected one by one from the 7 ChemCam calibration targets. Note that Scheme AT3 is special among the four alternative schemes since it has a validation set. The fourth alternative scheme (called “Scheme AT4”) is to use a conventional data transfer method. It is very similar to Scheme AT1, and the only difference is that the data transfer method is additionally employed in Scheme AT4. Specifically, the model is still the source domain CNN model, while the testing set spectra are transferred into “Earth laboratory corrected spectra” before the testing. The Mars-to-Earth data transfer is carried out based on the data conversion matrix provided by the ChemCam team.50
The quantification performance superiority of Scheme TL over the four alternative schemes would be demonstrated in section 4.2.1.
No. of samples | No. of spectra | Description | |
---|---|---|---|
Training set | 7 | 175 | • Spectra from all the 7 ChemCam calibration targets |
• To retrain the target domain CNN model | |||
Testing set | 3 | 575 | • Spectra from 3 Martian natural targets |
• To test the prediction performance (generalizability) of the target domain CNN model | |||
Results | Complete the retraining and optimization of the source domain CNN model and acquire a high-performance target domain CNN model | ||
Performance evaluation | • Compared with the “reference” composition measured using the APXS technique on Curiosity rover | ||
• Compared with the “reference” composition calculated by the ChemCam team, i.e. the MOC results |
Since the Martian natural target samples do not have strictly real CCV labels, the APXS-measured concentrations are regarded as the reference “real CCV” labels for prediction performance evaluation. As mentioned above, the ChemCam team has provided their composition prediction results called MOC. The so-called MOC method mainly comprises two algorithms, i.e. the partial least squares 1-submodel (PLS1-SM) and independent component analysis (ICA). Further introduction of the MOC method would be offered in section 5.2.2. The comparison between the Scheme TL results and the MOC results would be shown in section 4.2.2.
![]() | (4) |
In order to inspect the model prediction performance in a more meticulous way, we calculate the component-wise RMSE in addition to the overall RMSE described above. Taking the lth component in the testing sample as an example, we can calculate its component-wise RMSE based on eqn (5)
![]() | (5) |
Although it is able to indicate accuracy, the RMSE indicator has certain shortcomings, as it is naturally lower when the real concentration value is lower, even if the actual error is very high. Therefore, in order to comprehensively evaluate the accuracy, we also adopt relative error (RE) as an evaluation indicator. Similar to component-wise RMSE, the component-wise RE of the lth component can be calculated using eqn (6)
![]() | (6) |
In addition, the overall RE (or simplified as RE) of each testing sample can be calculated by averaging the L component-wise RE values of this sample.
As shown in Fig. 6(a), both the training RMSE and the validation RMSE have an overall trend of gradual decline, implying that there is no anomaly throughout the whole process. From Fig. 6(b), one may find that both the training RMSE and the validation RMSE can reach very small values (as low as 0.80 wt% and 0.85 wt%, respectively), indicating that there appears no underfitting situation; meanwhile, the validation RMSE is a bit higher than the training RMSE, indicating that there is no overfitting situation.
With regard to the performance of the above optimal CNN model in the testing, the overall average RMSE upon the source domain testing set is 4.88 wt%. In a more detailed way, we have examined the RMSE of each of the 8 testing set samples, and the results are illustrated in Fig. 7(a). The maximum testing RMSE of a single sample is 9.96 wt% (KGa-2med-S), while the minimum testing RMSE of a single sample is 1.52 wt% (Shergottite). The mean and median of the 8 testing RMSE values are 4.14 wt% and 3.27 wt%, respectively. It is not difficult to find that the general error level of the 4 glass samples (left half, blue bars) is lower than that of the 4 ceramic samples (right half, green bars). The possible reason is that the ceramic samples have considerable heterogeneities at the level of the laser beam diameter of the ChemCam LIBS system.50,66
Besides the overall RMSE, we have also calculated the component-wise RMSE values for each inspected component (averaging over the 8 samples), as displayed in Fig. 7(b). The maximum testing RMSE of a single component is slightly below 7 wt%, while the minimum testing RMSE of a single component can be less than 0.4 wt%. The mean and median of the 8 testing component-wise RMSE values are 2.73 wt% and 2.21 wt%, respectively. Paying particular attention to the three key components in TAS (Total Alkali Silica) classification, we can find that the RMSE of SiO2 is below 7 wt%, and the RMSE of either Na2O or K2O is even well below the 1 wt% level, implying that the CNN model is promising for TAS classification in the future (currently only the RMSE of SiO2 has not yet reached the TAS error requirement, i.e. ≤4 wt%).
As mentioned above, in addition to the RMSE values, the overall RE and component-wise RE values are also employed as the evaluation indicators of the model performance. Information about the overall RE for each of the 8 testing set samples is listed in Table 8. For the optimal source domain CNN model, the median RE value is 56.42%; the minimum RE value is 30.64% (NAu-2med-S), while the maximum RE value is as high as 1256.23% (Macusanite). Information about component-wise RE for each inspected component (averaging over the 8 samples) is displayed in Table 9. The median component-wise RE value is 74.68%, the minimum component-wise RE value is 16.5% (SiO2), while the maximum component-wise RE value can reach as high as 1127.20% (MgO). The reason for the high RE value of Macusanite is that the real concentration values of TiO2, FeOT, MgO, and CaO in this sample are less than 1 wt%, and even small prediction errors can make the RE value very high. In particular, the component-wise RE of MgO in Macusanite is extraordinarily high, up to 8731.53%, which is an extreme outlier among all the component-wise RE values. This outlier value makes the component-wise RE of MgO significantly higher than the component-wise RE values of other components.
RE% | Shergottite | Picrite | Macusanite | Norite |
---|---|---|---|---|
Testing | 60.51 | 58.05 | 1256.23 | 54.79 |
RE% | NAu-2hi-s | NAu-2lo-S | NAu-2med-S | KGa-2med-S |
---|---|---|---|---|
Testing | 51.47 | 32.44 | 30.64 | 58.66 |
RE% | SiO2 | TiO2 | Al2O3 | FeOT |
---|---|---|---|---|
Testing | 16.52 | 149.72 | 31.86 | 76.36 |
RE% | MgO | CaO | Na2O | K2O |
---|---|---|---|---|
Testing | 1127.20 | 73.00 | 35.11 | 93.01 |
Despite the few extremely high RE values, the above results demonstrate that the finally determined source domain CNN model can generally achieve good accuracy for the testing set (although far from ideal). The generally good performance of the source domain CNN model makes us believe that this pretrained model has potential to behave well on the target domain dataset after proper model retraining.
The testing RMSE values in the 7 testing rounds of the five schemes are displayed in Fig. 8(a). For a better illustration of the performance comparison, three background colors are used to indicate the performance ranking of the five schemes, with green standing for the best (meanwhile bold magenta font), blue for the medium, and pink for the last. For example, in the testing round, when Norite is adopted as the testing set sample, the RMSE value of Scheme TL is 1.03 wt%, while the RMSE values of Scheme AT1, Scheme AT2, Scheme AT3 and Scheme AT4 are 8.51 wt%, 1.19 wt%, 6.87 wt% and 8.07 wt%, respectively. Hence, it is apparent that Scheme TL is the best, while Scheme AT1 ranks the last. It is easy to find that in all the 7 testing rounds, Scheme TL can achieve the best performance among the five schemes. Moreover, the maximum RMSE of Scheme TL is below 6 wt% (5.57 wt% for KGa-2med-S), while the maximum RMSE values of Scheme AT1, Scheme AT2, Scheme AT3 and Scheme AT4 are 10.13 wt%, 8.19 wt%, 7.30 wt% and 9.78 wt%, respectively.
As in section 4.1, we have examined the component-wise RMSE values for each inspected component (averaging over the 7 testing samples). As shown in Fig. 8(b), Scheme TL can gain the lowest component-wise RMSE for all the eight components except K2O (Scheme TL ranking 3rd for K2O). Additionally, for all the eight components, the maximum component-wise RMSE value of Scheme TL is below 3.5 wt% (3.43 wt% for SiO2), while the maximum component-wise RMSE values of Scheme AT1, Scheme AT2, Scheme AT3 and Scheme AT4 are 19.15 wt%, 4.61 wt%, 10.47 wt% and 17.69 wt%, respectively.
The testing RE values in the 7 testing rounds are shown in Fig. 8(c). Scheme TL can achieve the lowest RE in 4 out of the 7 testing rounds. Moreover, for the 7 testing rounds, the mean RE value of Scheme TL is 64.35%, while the mean RE values of Scheme AT1, Scheme AT2, Scheme AT3 and Scheme AT4 are 276.36%, 64.96%, 116.41% and 63.45%, respectively. Only Scheme AT4 is slightly better than Scheme TL regarding the mean RE value.
Meanwhile, the component-wise RE values are displayed in Fig. 8(d). Scheme TL achieves the lowest component-wise RE for three components, i.e. SiO2, Al2O3 and CaO, and ranks 2nd for three other components, i.e. TiO2, FeOT and K2O. The mean component-wise RE value of Scheme TL is 64.35%, while the mean component-wise RE values of Scheme AT1, Scheme AT2, Scheme AT3 and Scheme AT4 are 102.84%, 64.96%, 116.41% and 63.45%, respectively. Only Scheme AT4 is slightly better than Scheme TL regarding the mean component-wise RE value.
Generally speaking, the testing results presented in Fig. 8 can demonstrate well the accuracy superiority of the transfer learning scheme over those schemes only adopting deep learning.
In addition to the performance regarding accuracy, we have also looked into the performance regarding training efficiency. Specifically, we record the training RMSE evolution for Scheme TL and Scheme AT2 in the process of acquiring the target domain CNN model (there is no such training process for Scheme AT1). Taking the case of testing KGa-2med-S as an example, the evolution curves of the training RMSE for the two schemes are shown in Fig. 9.
The initial training RMSE (i.e. RMSE at iteration step 1) of Scheme TL is 12.82 wt%, apparently lower than that of Scheme AT2, which is as high as 18.38 wt%. This means that the source domain pretrained CNN model really has a better “knowledge foundation” than the randomly initialized new CNN model. After 30 iterations (i.e. at iteration step 31), Scheme TL can already achieve an optimal model, with the training RMSE dropping to below 2.5 wt%, while Scheme AT2 still has a training RMSE over 3 wt%. If we further extend the iteration for Scheme AT2, the training RMSE would go on decreasing but in a very sluggish way, and an optimal model can be obtained at iteration step 151, with a training RMSE of 2.69 wt%. Even though the iteration step number of Scheme AT2 has reached five times that of Scheme TL, the training RMSE of Scheme AT2 is still higher than that of Scheme TL. This is consistent with the fact that for KGa-2med-S the testing RMSE of Scheme AT2 is higher than that of Scheme TL (as already shown in Fig. 8).
It is noteworthy that for the KGa-2med-S testing results displayed in Fig. 9, the batch sizes adopted in the two schemes are identical (both are 50). However, in most of the other testing round cases, the batch size of Scheme AT2 is larger than that of Scheme TL. Information about the two hyperparameters of the optimal model in each scheme, namely the iteration step number and batch size, is provided in Table 10.
Target name | Step TL | Step AT2 | BS TL | BS AT2 |
---|---|---|---|---|
Norite | 51 | 151 | 100 | 50 |
Picrite | 31 | 151 | 75 | 100 |
Shergottite | 31 | 31 | 50 | 100 |
KGa-2med-S | 31 | 151 | 50 | 50 |
NAu-2lo-S | 51 | 201 | 75 | 50 |
NAu-2med-S | 101 | 201 | 50 | 75 |
NAu-2hi-S | 101 | 151 | 50 | 75 |
It can be seen that in all the testing cases, the iteration step number of the optimal model for Scheme AT2 is no less than the Scheme TL counterpart, and in 5 of the 7 testing cases the batch size of the optimal model for Scheme AT2 is no smaller than the Scheme TL counterpart. Considering the fact that a larger batch size in the training means a longer time within each iteration step, the results in Table 10 indicate that Scheme AT2 has a lower training efficiency than Scheme TL. Therefore, the performance of our transfer learning scheme is stronger than that of the alternative scheme in terms of both prediction accuracy and training efficiency.
Upon the premise that the APXS-measured CCVs are “real values”, the 8 component-wise RMSE values (average over the 3 Martian samples) of the Scheme TL and MOC results are calculated, as displayed in Table 11. The mean of the 8 component-wise RMSE values of the Scheme TL results is 1.80 wt%, while that of the MOC results is 1.50 wt%. Meanwhile, the median of the 8 component-wise RMSE values of the Scheme TL and MOC results are 1.30 wt% and 1.54 wt%, respectively. These results indicate that the quantification accuracy of our transfer learning scheme can be well close to that of the well-designed ChemCam scheme when predicting the CCVs of Martian natural samples.
SiO2 | TiO2 | Al2O3 | FeOT | MgO | CaO | Na2O | K2O | |
---|---|---|---|---|---|---|---|---|
TL | 2.45 | 0.68 | 2.94 | 4.95 | 1.57 | 1.03 | 0.50 | 0.22 |
MOC | 2.57 | 0.49 | 2.34 | 2.55 | 2.08 | 1.01 | 0.75 | 0.20 |
In Fig. 10, we illustrate the concentration values of two representative components in the Martian natural samples, i.e. SiO2 (Fig. 10(a)–(c)) and Na2O (Fig. 10(d)–(f)).
For SiO2, the APXS-measured concentration values of the 3 samples are 43.7 wt%, 41.9 wt% and 45.74 wt%, respectively. The mean and standard deviation values of the MOC results (μ ± σ) are 44.91 ± 2.02 wt%, 44.70 ± 1.04 wt% and 45.70 ± 2.67 wt%, respectively, while those of the Scheme TL results are 46.21 ± 1.20 wt%, 45.68 ± 0.33 wt% and 45.64 ± 0.85 wt%, respectively. Generally speaking, the results provided by the three different methods are quite close, and the Scheme TL results show a higher stability than the MOC results (indicated by the smaller standard deviations in all the 3 samples).
For Na2O, the APXS-measured concentration values of the 3 samples are 2.22 wt%, 2.15 wt% and 2.22 wt%, respectively. The statistics of the MOC results are 2.55 ± 0.52 wt%, 3.01 ± 0.05 wt% and 2.88 ± 0.45 wt%, respectively, while those of the Scheme TL results are 2.37 ± 0.38 wt%, 2.62 ± 0.17 wt% and 1.68 ± 0.35 wt%, respectively. It can be found that most Scheme TL predicted values are lower than the MOC values, and closer to the APXS values (especially obvious in Fig. 10(e)). In general, the Scheme TL results show a higher stability than the MOC results (indicated by the smaller standard deviations in 2 of the 3 samples).
For both SiO2 and Na2O, there exist some cases in which the Scheme TL and the MOC results are almost equal to each other (e.g.Fig. 10(a) Position 4, Fig. 10(b) Positions 1 and 2, and Fig. 10(d) Position 9). What is more, in a few cases, the Scheme TL and the MOC results are meanwhile very close to the APXS results (e.g.Fig. 10(c) Positions 1 and 5 and Fig. 10(d) Position 3). Meanwhile, for the Scheme TL and the MOC results, the fluctuation trends of predicted values changing with the shot position are generally consistent.
In order to further provide an overall statistics of the MOC and Scheme TL prediction results for all the eight components, we display the mean and standard deviation information in Fig. 11, in the form of a center point/error bar (μ ± σ). It is intuitive that in most cases the mean values are quite close to the APXS reference values for either MOC or Scheme TL, and meanwhile the MOC results have larger fluctuations than the Scheme TL results (e.g.Fig. 11(c) and (f), SiO2 and MgO). In fact, the average of the σ values for the MOC results is 0.92 wt%, while the counterpart for the Scheme TL results is 0.72 wt%; the median of the σ values for the MOC results is 0.58 wt%, while the counterpart for the Scheme TL results is 0.39 wt%, indicating the generally higher stability of the Scheme TL results, despite that there are a few exceptions in certain components such as FeOT. It is worth recalling that the discussion about stability herein is based on the important homogeneity assumption mentioned above.
Although the APXS-measured concentrations may not be strictly real values, the results shown above imply that our scheme (based on deep learning and transfer learning) has great potential to offer good composition prediction on Martian natural target samples, just like the elaborate scheme designed by the ChemCam team (based on PLS1-SM and ICA).
Firstly, during the training and validation of our source domain CNN model, we have ensured that the validation set is independent of the training set, i.e. the validation data are always “unseen data” for the model. Therefore, the very small discrepancy between the training RMSE and validation RMSE can, to a certain extent, reflect the low possibility of overfitting. Meanwhile, the validation RMSE is a bit higher than the training RMSE, implying that the result is quite reasonable.
Secondly, the source domain CNN model is believed to have good generalizability because the testing set samples themselves are really “challenging” samples. Recall that the testing samples are the 8 duplicate samples of the ChemCam calibration targets. The physical and chemical properties of these 8 samples determine that it is difficult to achieve high LIBS quantification accuracy on them: the ceramic targets (NAu and KGa series) have significant heterogeneities at the level of the beam diameter, and the glass targets (Shergottite, Picrite, Norite, and Macusanite) seem to exhibit chemical matrix effects due to the fact that they are vitreous instead of mineralic, affecting their chemical bonding and optical coupling to the laser.50 In fact, two samples of them, namely KGa-2-med-S and Macusanite, were totally excluded in the quantification work of the ChemCam team, and the two samples were also excluded in ref. 34. Besides, Macusanite was also excluded in ref. 55. Considering that we have used all the 8 “difficult” samples (especially the two “very difficult” samples) as the testing samples, our quantitative analysis itself is a highly challenging task. Therefore, an average testing RMSE of 4.88 wt% might not indicate the low performance of the source domain CNN model.
In fact, when comparing the average testing RMSE of our CNN model with those reported in three other studies,34,50,55 it can be found that the average testing RMSE values in the four studies are at the same level, about 3 wt% (note the average is calculated only on those 6 samples that simultaneously appear in all the four studies). Moreover, upon two samples (i.e. Norite and Shergottite), our CNN model can even achieve the lowest RMSE. Therefore, the performance of our source domain CNN model is generally as good as other excellent groups in the international LIBS community.
In a paper which focuses on the explanation of CNN function principles,67 it has been demonstrated that the convolutional layers at different positions in the deep CNN model extract features with different levels: front layers are responsible for extracting primary level features like edges and corners; middle layers mainly extract intermediate level features, namely weighted combinations of the primary features; while backend layers concentrate on high level features, which are usually abstract and strongly correlate with the real labels in the target task.
For the deep learning CNN model, the first several layers (i.e. front layers) are responsible for extracting the explicit concrete features in the LIBS data, such as the contour edge details and the relevant position information of the characteristic peaks, while the last several layers (i.e. backend layers) are responsible for learning the implicit abstract features and establishing the mapping relationship between the abstract features and the component concentration values. On the one hand, the concrete features are highly similar for the spectra in the two domains, and hence we choose to freeze the front layers (first 5 layers herein) of the pretrained CNN model. On the other hand, the mapping relationship can be considerably distinct for the spectra in the two domains, and hence we retrain the backend layers (last 8 layers herein) of the pretrained CNN model according to target domain data. Based on such a freezing/retraining strategy, the CNN model in our PMTL scheme can fully utilize the old “versatile” knowledge in the source domain and pertinently learn the new “specific” knowledge in the target domain. Therefore, the CNN model can well adapt to the data pattern in the target domain and achieve good prediction results when analyzing the ChemCam Mars in situ spectra.
In most of the testing cases, Scheme AT1 ranks last among the five schemes. This is due to the fact that the CNN model in Scheme AT1 only possesses the spectral property knowledge in the source domain (ChemCam laboratory spectra) and has never learned any knowledge from the target domain (ChemCam Mars in situ spectra), while the spectral properties in the two domains could have non-ignorable discrepancies resulting from the differences in instrumental function and environmental conditions (as illustrated in Fig. 4).
Despite the relatively poor performance of Scheme AT1, it cannot be inferred that the knowledge in the source domain is useless. The general superiority of Scheme TL over Scheme AT2 can well indicate that the data patterns learned by the model from the large amount of source domain spectra play a valuable role in analyzing the target domain spectra. It is worth emphasizing that the source domain data are not only abundant in quantity but also rich in component diversity. Thanks to such diversity, Scheme TL can demonstrate a more remarkable advantage over Scheme AT2 when testing the 4 ceramic samples, compared with the cases when testing the 3 glass samples (note: the ceramic samples have more complicated matrices and higher heterogeneity than the glass samples50).
As to Scheme AT3, although it simultaneously uses the source domain data and target domain data for model training, it does not show expected performance, merely better than Scheme AT1. The major possible reason is that the model may get “confused” when it tries to identify the feature patterns from the domain-mixed spectral data, since the feature patterns in the source domain can be different from those in the target domain.
Despite being quite similar to Scheme AT1, Scheme AT4 can achieve obviously better performance than Scheme AT1, just by adding a Mars-to-Earth data transfer procedure. The good performance of Scheme AT4 implies that the data transfer method can make positive effects, since it indeed reduces the dissimilarity between the spectral data properties in the two domains to a certain extent.
From these results, one may infer that a model which learns knowledge from both domains would be more competent than a model which only learns knowledge from a single domain.
Unlike the data transfer method, which focuses on data-driven statistical alignment, the transfer learning method focuses on model-driven knowledge transfer. In the data transfer method, if there are not enough common samples as a transfer benchmark, the data alignment would not be statistically meaningful and hence result in a poor transfer effect. In particular, when the domain similarity is low, it may require a great many common samples, leading to considerable time and economic costs in sample preparation. By contrast, the transfer learning method only requires that there are abundant data in the source domain for model pretraining, and a few data in the target domain for model retraining. In principle, it can extract transferable features and realize knowledge transfer without any common sample. Even if the data similarity in the two domains is low, we only need to add the number of retraining layers. Even for the worst case, we just need to retrain all the layers, with no need to change the CNN model architecture. Therefore, the transfer learning method is advantageous in terms of less sample preparation work.
Another noteworthy aspect is the issue of instrument. In this work, the LIBS instruments in the two domains are almost identical (both built by the ChemCam team), so this issue is usually neglected. However, if the instruments in the two domains are quite different (e.g. transferring between ChemCam and MarSCoDe), the superiority of transfer learning over data transfer would be more prominent. In that case, the two LIBS instruments may differ in spectral resolution, whole spectral range, response function, etc. Hence, even for the same sample, the spectrum acquired by one instrument can be greatly distinct from that acquired by the other instrument, regarding the number of data points, the interval between two adjacent data points, the central wavelength corresponding to each data point, the overall spectral profile, and so forth. In this situation, the data transfer method can be hardly effective. Although the pretrained-model-based pattern would also be less useful, we can choose other better-suited transfer learning patterns, such as the feature-based pattern, which utilizes high-dimensional feature space regardless of specific data format. Therefore, the transfer learning method is advantageous in terms of better flexibility and wider applicability.
The MOC method consists of two core algorithms, namely PLS1-SM and ICA, and the final prediction result of each oxide concentration is a weighted average of the PLS1-SM result and the ICA result.
(1) PLS1-SM method: To build and train the PLS1 model, 408 geochemical standard samples in the ChemCam laboratory database are employed (totally 2040 spectra). First, the 408 samples are divided into three groups according to the concentration of the oxide to be analyzed, denoted as “low”, “medium” and “high”, respectively. Then, the spectra of the samples in each group are used to train a submodel, called PLS1-SM1, PLS1-SM2 and PLS1-SM3, respectively. After that, another submodel is trained based on all the 2040 spectra, called PLS1-SM4. When predicting the oxide concentration based on an unseen LIBS spectrum, the PLS1-SM4 model is first adopted to provide a preliminary concentration value. Then, the researchers should judge which group this preliminary concentration value belongs to (low, medium, or high) and select an appropriate submodel from PLS1-SM1, PLS1-SM2 and PLS1-SM3. Finally, the concentration value predicted by the corresponding appropriate submodel is regarded as the result of the PLS1-SM method. It is noteworthy that the PLS1-SM process needs to be carried out for each oxide independently.
(2) ICA method: to build and train the ICA model, the researchers also use the 408 samples and the 2040 spectra mentioned above. For each spectrum, the ICA algorithm can decompose it into K independent signal components. The researchers set K = 8 here because there are eight oxides to be analyzed. Represented by a vector, each decomposed independent component needs to be correlated to a certain oxide through manual judgement. Then, through certain matrix operation, each independent component can yield an ICA score, which is also a vector with the same dimension. Finally, each ICA score can be related to the concentration value of the corresponding oxide via a certain fitting pattern. After doing this for all the eight oxides, the ICA model training is completed. When predicting the oxide concentration based on an unseen LIBS spectrum, the researchers can calculate its eight ICA scores based on the trained ICA model and further calculate the corresponding concentration values of the eight oxides.
(3) Weighting method: the core is to calculate the weighted sum of the PLS1-SM result and the ICA result. For a certain oxide, the researchers need to manually set an appropriate proportion weight value, based on their preliminary estimation of the range in which the oxide concentration value falls. It is worth emphasizing that the manual setting of the weight value may require a lot of trial and error, and furthermore each oxide may have its own appropriate weight value.
Since the RMSE level of the transfer learning method and that of the MOC method are generally close, we would present the advantages of the transfer learning method over the MOC method, mainly from the perspective of time and labor cost.
Firstly, the MOC method requires the aforementioned data transfer as a preprocessing step, whose computational cost will increase as the number of samples and/or the number of LIBS spectra increases. Besides, this step involves considerable cost in the preparation of common samples. By contrast, the transfer learning method needs no data transfer, hence requiring less computational cost and less sample preparation work.
Secondly, the MOC method needs quite a lot manual interventions, e.g. manual judgement is required when correlating ICA components to certain oxides and fitting the ICA scores with concentration values; manual trial-and-error processes are required when setting the proportion weight values. Furthermore, when researchers build and train the PLS1 submodels and ICA models, they need to carry out the process for each oxide separately, and the model parameters can hardly be shared. Additionally, even when there are slight changes in the LIBS data and/or the specific target task, PLS and ICA models need to be rebuilt and retrained from scratch. This is understandable since these conventional models do not accumulate any reusable and shareable knowledge about “features”. In our transfer learning method, however, the CNN model can simultaneously analyze the concentration values of multiple oxides. Moreover, the CNN model construction is almost in a once-for-all pattern, since most hyperparameters of the model do not need to be modified when we update the spectral datasets or change the oxide to be analyzed. Even for the CNN model optimization (both the source domain model and target domain model), although a few hyperparameters need to be tuned by trial and error, this tuning process may not be so labor-intensive because there are many automatic optimization methods (e.g. Bayesian optimization, genetic algorithm, simulated annealing, etc.). Therefore, although a single PLS/ICA computation is faster and easier than a single CNN computation, the entire MOC method can be more time-consuming and labor-intensive than our CNN-based transfer learning method.
Based on the above analysis, it can be stated that although the MOC method is exquisite and the MOC result accuracy is admirable, the proposed transfer learning method is advantageous regarding the time and labor cost.
Firstly, the detection distances in the two domains are not identical. As described in section 2.2, all the spectra in the source domain (Earth laboratory dataset) are acquired at a fixed distance of 1.6 m, while the spectra in the target domain (Mars in situ dataset) are acquired at several different distances (ChemCam calibration targets, 1.6 m; Martian natural targets varying from 2.4 to 2.7 m). It is well known that detection distance is one of the most important factors that can affect the LIBS spectral profile characteristics.71 Specifically, for the CNN method, intensity normalization based on the whole spectral intensity sum has been demonstrated to be a practical method to improve the performance in a classification task, since the intensity normalization can productively mitigate the distance effect.48 However, the trial and error in the current study indicates that a mere intensity normalization may negatively influence the accuracy of a CNN in a quantification task. Hence, a potential way to further improve the accuracy of Mars in situ spectra is to design a more powerful distance effect correction strategy.
Secondly, the physical properties of the samples in the source domain and those in the target domain are different. For one thing, the samples in the laboratory are commonly pressed pellets, while the samples on the Mars are in a natural state (except the calibration targets). So, the physical parameters (e.g. particle size and compactness) of the samples in the two domains can be quite different, hence leading to the physical matrix effect. For another, the samples in the laboratory usually have rather clean surfaces, while the samples on the Mars are very likely to be covered by the Martian dust on the surfaces. As described in ref. 50, the ChemCam team routinely regards the LIBS spectra from the first 5 laser shots on each sampling position as dust-affected and directly abandon those spectra in the subsequent analysis. In the process of transfer learning, the physical matrix effect might be largely corrected through the knowledge transfer from the source domain to the target domain,63 but the dust effect can hardly be corrected, since the chemical composition of the Martian dust could be drastically distinct from that of the target sample (e.g. rock and soil) and the dust-affected spectra are genuine outlier data instead of knowledge-contained data. Therefore, it would be valuable to develop a more sophisticated strategy which can exactly pick out all the dust-affected spectra for every single sample (rather than simply abandon the first 5 spectra). As long as the spectra in the source domain and/or the target domain are appropriately filtered (i.e. removing the dust-affected spectra), this transfer learning methodology is still expected to demonstrate excellent performance and play an important role in the analysis of Mars in situ LIBS data.
Thirdly, the LIBS detection scenario in the laboratory is not totally the same as that on the Mars. In the laboratory, although the target samples can be placed in a chamber that is able to well simulate the Martian atmospheric environment, the LIBS instrument (i.e. LANL ChemCam laboratory testbed) is in the normal atmospheric environment. In the Mars field detection, however, both the target samples and the LIBS instrument are in the Martian atmospheric environment. So, if the LIBS instrument can also be placed in a Mars-like environment when conducting the laboratory experiments,72 the similarity between the source domain data and target domain data can be further enhanced, and this would be helpful to further promote the transfer learning performance.
Note that we do not intend to claim one dissimilarity origin is more important than another, e.g. the distance-caused dissimilarity is more important than the environment-caused dissimilarity. It would be beneficial for improving the effectiveness and/or efficiency of transfer learning if we can mitigate any one type of dissimilarity and hence the overall dissimilarity.
In this study, the foundation algorithm is CNN, and we can consider further ameliorating the source domain CNN model by adopting more sophisticated optimization techniques and/or employing more samples for pretraining the source domain model.
In fact, the number of training samples is not only important to the pretraining of the source domain model but also important to the retraining of the target domain model. Although transfer learning does not require too many samples for retraining due to the inherent nature of this methodology, it does not mean that even very few samples can ensure high-quality retraining. As can be seen, despite that the overall RMSE level of our Scheme TL shows superiority over the four alternative schemes and is close to that of the MOC results, the accuracy of Scheme TL is far from the ideal (regarding the relatively high average RE level and the few extremely high RE values). Besides the aforementioned reason that there are “very difficult” samples in our testing set, the number of samples for retraining the target domain model can also be an important reason.
In this work, there are merely 7 samples in all for the target domain model retraining, less than the retraining sample numbers in two other CNN-PMTL studies.60,61 In order to improve the transfer learning effectiveness, one can try to use more samples for retraining. If the small sample size is hard to change, one may consider adding the number of spectra of each sample (e.g. in our scheme, there are 25 spectra for each sample and 175 spectra totally). We should also pay attention to overfitting prevention by utilizing proper regularization techniques in the CNN model and meanwhile retraining as few layers as possible.
It is worth pointing out that we can also try to employ better transfer learning patterns, e.g. feature-based transfer learning pattern, which may not require as many training samples as the current pretrained-model-based pattern.
Finally, it is well known that deep learning algorithms like the CNN have inherent defects such as the poor interpretability of the results,73,74 since the features learned by the CNN model may be abstract. In fact, the knowledge transferred in the transfer learning algorithms can also be highly obscure. Nowadays, computer vision experts have proposed a series of visualization methods to observe the importance of different data parts in an image for image classification tasks, such as GradCam,75 ScoreCam,76etc. However, these methods are currently only applied to classification tasks, and it is still highly challenging to apply these methods to regression tasks like LIBS quantification. Therefore, enhancing the interpretability of the scheme proposed in this study, which involves both deep learning and transfer learning, is also a worthwhile research direction in the future.
When testing the spectra from the 7 ChemCam calibration targets based on the model retrained in a “leave-one-out” way, the transfer learning scheme can well outperform four alternative schemes that adopt mere deep learning, in terms of overall RMSE values, component-wise RMSE values, overall RE values and component-wise RE values. When testing the spectra from the Martian natural targets based on the model retrained by all the 7 ChemCam calibration targets, the APXS-measured results are regarded as the reference “real” values, and the transfer learning scheme prediction results are generally as accurate as the MOC results acquired via the exquisite PLS1-SM/ICA scheme, while the transfer learning scheme results show relatively smaller fluctuations.
As demonstrated herein, the deep learning CNN model in the transfer learning scheme can make full use of the data pattern knowledge contained in the abundant source domain samples, and it needs only a few new samples in the target domain and a small number of iteration steps to realize an efficient retraining and achieve good prediction performance in the target domain. Besides, unlike the conventional “data transfer” strategies, the transfer learning scheme focuses on “knowledge transfer” and hence requires no effort for data conversion. The model, instead of a human analyst, would be responsible for mining and transferring the latent knowledge, hence making the entire process more labor-saving. In short, the proposed scheme can fully exploit the advantages of deep learning while effectively addressing the labeled-sample scarcity issue.
As the first work that employs deep learning to analyze Mars in situ LIBS data, and meanwhile the first work that employs transfer learning ("knowledge transfer" instead of "data transfer") to analyze Mars in situ LIBS data, this study only takes the ChemCam dataset as the example, but we have reasons to believe that the proposed deep learning combining transfer learning strategy is a promising methodology to analyze the in situ spectra collected by other Mars LIBS payloads such as SuperCam and MarSCoDe, as well as the field detection data in future planetary exploration missions.
This journal is © The Royal Society of Chemistry 2025 |