Laser-induced breakdown spectroscopy chemometrics for ChemCam Mars in situ data analysis based on deep learning and pretrained-model-based transfer learning

Zhicheng Cui; Luning Li; Rong Shu; Fan Yang; Yuwei Chen; Xuesen Xu; Jianyu Wang; Agnès Cousin; Olivier Forni; Weiming Xu

doi:10.1039/D4JA00407H

View PDF VersionPrevious ArticleNext Article

DOI: 10.1039/D4JA00407H (Paper) J. Anal. At. Spectrom., 2025, 40, 2306-2326

Laser-induced breakdown spectroscopy chemometrics for ChemCam Mars in situ data analysis based on deep learning and pretrained-model-based transfer learning

Zhicheng Cui ^abc, Luning Li *^b, Rong Shu ^abc, Fan Yang ^b, Yuwei Chen ^ad, Xuesen Xu ^a, Jianyu Wang ^abc, Agnès Cousin ^e, Olivier Forni ^e and Weiming Xu *^abc
^aCollege of Physics and Optoelectronic Engineering, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, P. R. China. E-mail: xuwm@mail.sitp.ac.cn
^bKey Laboratory of Space Active Opto-electronics Technology, Shanghai Institute of Technical Physics, Chinese Academy of Sciences, Shanghai 200083, P. R. China. E-mail: liluning@mail.sitp.ac.cn
^cUniversity of Chinese Academy of Sciences, Beijing 100049, P. R. China
^dAdvanced Laser Technology Laboratory of Anhui Province, Hefei 230037, P. R. China
^eInstitut de Recherche en Astrophysique et Planétologie (IRAP), Université de Toulouse, UPS, CNRS, Toulouse 31400, France

Received 13th November 2024 , Accepted 6th May 2025

First published on 27th May 2025

Abstract

As an in situ and stand-off detection technique, Laser-Induced Breakdown Spectroscopy (LIBS) has been successfully applied in Mars exploration, by installing the LIBS instrument on three Mars rover payloads, i.e. ChemCam, SuperCam and MarSCoDe. Effective analysis of the Mars in situ data requires high-performance LIBS chemometrics. Deep learning methods like convolutional neural networks (CNNs) have been demonstrated to be powerful as LIBS chemometrics, but they need many labeled samples for model training. Since Mars exploration is a typical scenario where labeled samples are scarce, a natural idea is to take advantage of a large laboratory database. However, the profile discrepancies between Mars in situ spectra and laboratory spectra can be a prominent challenge for the joint use of the two-source data. To address this issue, conventional solutions focus on formulating data conversion strategies to make the different-source data more similar. Such a methodology can certainly yield positive effects, but it requires quite a few common samples and laborious design, and the data similarity would anyhow be limited. In order to employ deep learning for LIBS analysis, this study proposes a new scheme by integrating with a transfer learning technique, which focuses on “knowledge transfer” rather than conventional “data transfer”. ChemCam LIBS data quantification is taken as an example to illustrate the effectiveness. Specifically, a deep CNN model was constructed and trained based on 59760 LIBS spectra collected by a ChemCam laboratory duplicate, then this pretrained CNN model carried out transfer learning, i.e. the model parameters underwent a fine retraining based on 175 in situ spectra acquired from ChemCam calibration targets on Mars, and finally the CNN model after transfer learning was employed to analyze 575 other ChemCam in situ spectra from 3 Martian natural targets as a generalization performance testing. The results from the proposed transfer learning integrated scheme have been compared with the results from four alternative schemes merely including deep learning, as well as with the results from an exquisite scheme developed by the ChemCam team. Regarding the overall quantification accuracy, our scheme can noticeably surpass the four alternative deep learning schemes and show approximately equal performance to the ChemCam scheme. The results indicate that transfer learning is a promising booster for deep learning methods to accurately and efficiently analyze Mars in situ data collected from SuperCam, MarSCoDe, and future payloads.

1. Introduction

Laser-Induced Breakdown Spectroscopy (LIBS) is an atomic emission spectroscopy technique which can perform elemental analysis of diverse substances, regardless of whether they are in solid, liquid, gaseous, or colloidal state. Being able to carry out detection in a fast-response, micro-invasive, in situ and stand-off way,¹ LIBS has been widely applied in industrial control,^2,3 environmental monitoring,^4,5 biomedical surveys,^6,7 geochemical and geological investigations,^8,9 and so on.

In particular, the remote detection capability of LIBS has made it a highly favored tool for planetary exploration. As a matter of fact, the LIBS equipment has played a brilliant role in the past few Mars exploration missions, installed on scientific payloads like NASA's ChemCam (Curiosity rover)^10,11 and SuperCam (Perseverance rover),^12,13 as well as China's MarSCoDe (Zhurong rover in Tianwen-1 mission).^14,15 All the three payloads have successfully acquired in situ LIBS data on Mars. The LIBS spectra allow scientists to decipher the geochemical types and/or chemical components of the soil and rocks in the landing areas of the three Mars rovers (as illustrated in Fig. 1). Based on the chemical analysis results, quite a few important scientific discoveries have been made, such as revealing soil diversity and hydration in Gale crater^16,17 revealing igneous crater floor lithology with weak alteration signatures in Jezero crater,^18–20 and revealing wind regime shift in line with the end of ice age and modern hydroclimatic conditions in Utopia Planitia.^21,22


	Fig. 1 A MOLA map of Mars (MOLA DEM37), with the landing sites of the three Mars rovers (Curiosity, Perseverance and Zhurong) marked out.

To extract the qualitative and quantitative chemical information from the LIBS spectra, one needs not only good hardware that can offer accurate and stable emission line data (wavelength and intensity values) but also appropriate LIBS analysis methods that can formulate high-quality chemometric models. LIBS chemometrics is crucial to analytical performance (especially quantitative analysis) since the relationship between the characteristic line intensities and the corresponding element concentration values may deviate from the theoretical linearity, due to interferential factors like the physical matrix effect,²³ chemical matrix effect,²⁴ self-absorption effect,²⁵ and experimental condition effects.^26,27 Such nonlinear relationship limits the effectiveness of chemometric methods based on classic linear statistics, such as multivariate linear regression and partial least squares (PLS).^28,29

In order to improve the analytical accuracy, LIBS researchers have tried to draw support from various machine learning approaches, such as the random forest,^30,31 support vector machine,^30,32 least absolute shrinkage and selection operator (LASSO),^30,33 elastic net,^30,34 back-propagation neural network (BPNN),^35,36etc. In particular, in the past few years, some sophisticated deep learning methods have been imported, mainly including the convolutional neural network (CNN),^37–41 deep belief network⁴² and recurrent neural network.⁴³ Although deep learning-based chemometric models like CNN can achieve extraordinary prediction accuracy, it should be noted that their excellent predictions usually require a great number of labeled samples for model training.

Unfortunately, unlike laboratory measurements, in situ LIBS detection on Mars is a typical scenario where labeled samples for training are scarce. Specifically, the labeled samples herein refers to the LIBS spectra of the calibration targets onboard the payload along with the real chemical composition information of those calibration targets. For example, the labeled samples for ChemCam include the LIBS spectra collected from the 10 ChemCam calibration targets (1 titanium alloy, 1 graphite, 4 ceramic samples and 4 glass samples), along with their real composition data (measured using several techniques like X-ray fluorescence, fusion inductively coupled plasma, electronic microprobe, and laser ablation inductively coupled plasma mass spectrometry in pre-flight laboratory experiments^44,45). It is worth emphasizing that the scarcity of labeled samples essentially refers to the small quantity of the LIBS calibration targets onboard the payload, irrelevant to the total number of spectra. Even if thousands of spectra are collected on each single calibration target, the diversity of the concentration values of each chemical component would still be quite limited (e.g. ≤10 concentration values per component for ChemCam). Hence, it can be claimed that the labeled sample dataset of ChemCam has a small size, despite that there are thousands of LIBS spectra collected from ChemCam calibration targets on Mars. It is noteworthy that the labeled-sample scarcity issue is ubiquitous in planetary explorations (e.g. in Mars missions, this issue exists for ChemCam, SuperCam and MarSCoDe).

Apparently, the small size of the labeled-sample dataset would hinder the effective training of deep learning models. To the best of our knowledge, there has been no paper reporting the employment of deep learning chemometric model for analyzing in situ Mars LIBS spectra (ChemCam/SuperCam/MarSCoDe). Usually classic chemometrics like PLS are used to deal with the in situ Mars spectra, while deep learning methods only appear in the studies which analyze the spectra collected in laboratory simulating experiments.^46,47

Considering the performance superiority of the deep learning CNN method exhibited in our previous investigations,^46,48,49 we aim to utilize the CNN to analyze the in situ ChemCam spectra in this study and address the issue of labeled sample scarcity by taking advantage of the large ChemCam laboratory database.⁵⁰ It is well understood that laboratory LIBS data can be quite different from Mars in situ LIBS data. Therefore, it is necessary to carry out some technical processing before utilizing the laboratory data.

The conventional solutions focus on “data transfer”, aiming to make the LIBS spectral data from different sources more similar via specific data conversion techniques. Several teams have provided great studies based on various data transfer strategies.^50–55 Of course, such a methodology can yield positive effects, but it requires quite a few common samples as the data transfer benchmark, and designing the transfer strategy can be laborious and time-consuming, and the spectral data similarity would anyhow be limited.

Unlike the conventional solutions, this work adopts a new scheme by introducing a transfer learning technique, as illustrated in Fig. 2(a). Transfer learning is a technical term in the machine learning field, which is nowadays a popular technique for treating the so-called small-sample-size problems.⁵⁶ The fundamental idea of transfer learning is to mine the latent knowledge in an original domain and transfer it to a similar but different new domain, with the original domain and the new domain termed as “source domain” and “target domain”, respectively. There are four common transfer learning patterns, namely instance-based, feature-based, relation-based, and pretrained-model-based patterns. Fig. 2(b) demonstrates the principle of transfer learning by taking the pretrained-model-based pattern as an example. It is worth noting that, regardless of which pattern is used, the transfer learning technique focuses on “knowledge transfer” rather than “data transfer”. Therefore, the transfer learning-based scheme proposed in this study is clearly distinct from those that work focusing on “data transfer” mentioned above.


	Fig. 2 (a) A mind map demonstrating the difference between conventional solutions and this work regarding the problem of insufficient labeled samples/data for model construction. Conventional solutions focus on “data transfer”, while the transfer learning scheme proposed herein specifically refers to “knowledge transfer”. (b) An illustration of the fundamental idea of transfer learning (taking the PMTL pattern as an example).

As a matter of fact, transfer learning has been employed in a few previous LIBS studies. In 2018, Yang et al. reported an LIBS analysis study in steel metallurgy for quantification of the Cr concentration, which pioneeringly adopted transfer learning in the LIBS field.⁵⁷ Since the LIBS spectra of standard samples at high temperatures are considerably difficult to acquire, they employed the LIBS spectra obtained at room temperature as the source domain dataset and utilized the feature-based transfer learning method to analyze the target domain dataset, i.e. the high-temperature spectra. After that, several LIBS studies applying transfer learning have been reported, such as the quantification of metal elements in aluminum alloys,⁵⁸ the identification of rock lithology,⁵⁹ the quantification of main components in coal,^60,61 and the discrimination of crop production areas.⁶² Specifically, for the Mars LIBS spectrum oriented work, hitherto there has been only one paper involving transfer learning. Sun et al. adopted two transfer learning patterns (instance-based and feature-based) to realize efficient rock classification by focusing on correcting the physical matrix effect.⁶³ Note that their dataset was the LIBS spectra collected from 20 self-prepared terrestrial rock samples using a regular commercial instrument.

This study, for the first time, employs a scheme combining deep learning with transfer learning to analyze in situ Mars LIBS spectra (recalling that the term “transfer learning” herein specifically refers to the machine learning technique that focuses on “knowledge transfer” instead of “data transfer”). By utilizing transfer learning, one only needs to concentrate on the LIBS chemometric models, and elaborate data conversion between the different-source LIBS spectra is unnecessary, making the data processing more effortless.

The ChemCam dataset is taken as the demonstrative example. To be more specific, we have used a pretrained-model-based transfer learning (PMTL) method to quantify the concentrations of eight oxides commonly existing on the Martian surface, with the pretrained model being a deep learning CNN model. This deep learning combining transfer learning scheme can show accuracy superiority over four alternative schemes in which only deep learning is included, and the accuracy of the proposed scheme can generally be as high as that of an exquisite scheme designed by the ChemCam team based on classic chemometrics.

The subsequent text is arranged as follows: section 2 elucidates the LIBS detection samples, instruments and spectral datasets; section 3 describes the deep learning CNN model and the designed transfer learning scheme; section 4 exhibits the analytical results of our scheme and the performance comparison with other schemes; section 5 provides further discussion about some noteworthy points and some future prospects; and section 6 gives a conclusion.

2. Samples, instruments and datasets

2.1 Detection samples

As an investigation utilizing transfer learning, this study contains two sets of LIBS detection samples, namely one set of source domain samples and one set of target domain samples, whose spectra are adopted to construct the source domain dataset and the target domain dataset, respectively.

The source domain samples are chosen from the 408 geochemical standard samples (pressed pellets) prepared in several laboratories of the ChemCam team, with the chemical composition data of each standard sample clearly known.⁵⁰ As mentioned before, this study aims to quantitatively analyze the concentrations of eight common oxides that exist in the Martian surface substances. They are SiO₂, TiO₂, Al₂O₃, FeO_T, MgO, CaO, Na₂O, and K₂O. Some of the 408 standard samples do not contain all the eight oxides, hence those samples are excluded from this research, and finally 332 standard samples are selected as our source domain samples.

A violin-plot diagram of the concentration distribution of the eight oxides in the 332 source domain samples is shown in Fig. 3. In addition, Table 1 displays the statistics of the oxide concentrations, including the maximum, minimum, mean and median values.


	Fig. 3 Violin plots of the concentration distribution of the eight oxides in the 332 source domain samples. Each violin plot represents one oxide, as labeled in the horizontal ordinate. The solid line in each plot denotes the median level of the 332 concentration values, while the lower and upper dash lines denote the 25% and 75% levels of the 332 concentration values, respectively.

Table 1 Statistics of the 332 concentration values for each of the eight oxides in units of weight percentage (wt%)

	SiO₂	TiO₂	Al₂O₃	FeO_T	MgO	CaO	Na₂O	K₂O
Maximum	97.71	5.81	31.99	65.85	29.23	37.22	25.96	12.05
Minimum	0.58	0.01	0.09	0.06	0.01	0.01	0.01	0.01
Mean	56.62	0.91	15.24	7.31	3.62	4.78	2.26	2.48
Median	58.42	0.72	15.81	6.08	2.79	1.28	2.05	2.34

As to the target domain samples, they include two parts: (i) ChemCam calibration targets (for in situ calibration on Mars) and (ii) natural targets detected by ChemCam such as Martian soils and rocks. For the ChemCam calibration targets, we have selected 4 glass samples and 4 ceramic samples, i.e. (1) Macusanite, (2) Norite, (3) Picrite, (4) Shergottite, (5) KGa-2med-S, (6) NAu-2lo-S, (7) NAu-2med-S, and (8) NAu-2hi-S. The real composition information about these 8 ChemCam calibration targets can be found in the ESI of ref. 50 (material ID: 1-s2.0-S0584854716303913-mmc8). The detailed concentrations of the eight oxide components are displayed in Table 2. It is noteworthy that among the aforementioned 332 source domain samples, there exist 8 duplicate samples of the ChemCam calibration targets, and the composition of each duplicate sample is identical to that of its counterpart in the 8 real ChemCam calibration targets herein.

Table 2 Concentration values of the 8 ChemCam calibration target samples in units of weight percentage (wt%)

	SiO₂	TiO₂	Al₂O₃	FeO_T	MgO	CaO	Na₂O	K₂O
Macusanite	73.745	0.035	16.35	0.52	0.0125	0.235	4.065	3.99
Norite	47.88	0.7	14.66	15.7	9.62	12.77	1.53	0.06
Picrite	43.59	0.44	12.39	20.49	11.17	8.95	3.07	0.1
Shergottite	48.42	0.43	10.83	17.46	6.39	14.29	1.57	0.11
KGa-2med-S	35.64	1.47	23.71	2.86	1.68	11.46	0.72	0.26
NAu-2lo-S	43.78	0.78	7.63	18.28	2.97	8.26	1.44	0.4
NAu-2med-S	37.48	0.57	5.72	17.05	2.05	12.27	1.11	0.29
NAu-2hi-S	30.9	0.39	3.69	15.76	1.14	16.28	0.67	0.157

With regard to the natural targets detected by ChemCam on Mars, it is self-evident that the real composition information is not accurately known. However, the ChemCam team has offered the oxide concentration values calculated by their well-designed LIBS chemometric models. Besides, one can also find the oxide concentration values measured using an Alpha Particle X-ray Spectrometer (APXS), which is another scientific payload on Curiosity rover.⁶⁴ In this study, we have selected 3 Martian natural targets as part of the target domain samples, namely Portage (soil), Flaherty_2 (rock) and Gillespie_Lake_1 (rock), and employed the concentration values measured by the APXS as reference “real values” (not strictly real values). Hence, there are totally 11 target domain samples in the current research.

2.2 LIBS instruments

In this research, two LIBS instruments are involved corresponding to the source domain and the target domain, respectively. For the target domain, it is straightforward that the instrument is the ChemCam payload (i.e. flight model) onboard Curiosity rover,^10,11 and the spectral data are those in situ LIBS spectra acquired on Mars. For the source domain, the instrument is the LANL ChemCam laboratory testbed, which consists of a mast unit from the ChemCam engineering model and a body unit assembled from the ChemCam flight model spare parts.⁵⁰ The spectral data are those LIBS spectra collected in the laboratory, as further described below.

In the laboratory measurement, the LANL ChemCam laboratory testbed is in the normal atmospheric environment (the body unit at ordinary temperature and mast unit at 4 °C), while the source domain samples are placed in a vacuum chamber filled with a Mars-like atmosphere (933 Pa CO₂).⁵⁰ Each sample is probed at five separate locations, and each location is probed by 50 laser shots at 1.6 m distance. Note that for the target domain samples, the detection distances of the 8 ChemCam calibration targets are also 1.6 m, while those of the 3 Martian natural targets vary from 2.4 to 2.7 m. Moreover, when detecting the natural targets, the probing mode is the same as that mentioned above (i.e. five locations and 50 shots per location).

For either the ChemCam flight model or the ChemCam testbed, the LIBS spectrometer has three spectral channels, i.e. an ultraviolet channel (UV, 240.8–340.8 nm), a violet channel (VIO, 382.1–469.1 nm), and a visible and near-infrared channel (VNIR, 473.2–905.6 nm). Each LIBS spectrum has 6144 pixel data points.

It is worth emphasizing that although the ChemCam testbed and the ChemCam payload have almost identical instrument specifications, and the detection environments of the source domain samples and the target domain samples are similar, the spectra of the samples from the two domains are different, even if the samples have identical compositions. For example, the typical spectrum of the Norite sample (onboard the ChemCam payload, target domain sample) and that of the duplicate Norite sample (in the laboratory, source domain sample) have apparent discrepancies, even after intensity normalization, as shown in Fig. 4.


	Fig. 4 The typical normalized LIBS spectrum of the Norite sample onboard the ChemCam payload (blue line) and that of the duplicate Norite sample in the laboratory (red line). Either spectrum is the average of 45 activate spectra of one position. The normalization is based on the total intensity of each spectrum, hence minimizing the effect of the laser pulse energy difference between the ChemCam payload and the ChemCam testbed. (a) The overall spectra of the three channels (UV, VIO and VNIR). (b) and (c) The zoom-in spectra for certain wavelength ranges.

2.3 LIBS datasets

The LIBS spectral data adopted in this research are based on the ChemCam Clean Calibrated Spectra (CCS) released in the ChemCam data official website [https://pds-geosciences.wustl.edu/msl/msl-m-chemcam-libs-4_5-rdr-v1/mslccm_1xxx/]. Compared with the ChemCam raw data, the CCS data have undergone a series of spectral preprocessing, including subtracting the dark background, filtering the high-frequency noise, removing the white-light continuum, and correcting the instrument optical response (the CCS intensity is in units of photons).⁶⁵

Herein, we have carried out one more preprocessing step upon the CCS data. Specifically, some pixel data points in certain high-noise spectral regions are masked out, i.e. the 240.811–246.635, 338.457–340.797, 382.138–387.859, 473.184–492.427, and 849–905.574 nm regions, just like the operation described in ref. 50. After the masking, there remain 5484 pixel data points in each LIBS spectrum. Hence, for either the source domain or the target domain in this study, the standard data format of each spectrum is a 5484 × 1 matrix.

For each of the 332 source domain samples, we have randomly picked out 180 spectra from its 225 spectra (the first 5 spectra at each sampling location are excluded due to the possible dust contamination on the surface). Hence, there are totally 59760 LIBS spectra in the source domain dataset, and these spectra would be used to construct and train the source domain CNN model, i.e. the so-called pretrained model.

As to the target domain samples, we randomly select 25 spectra from each of the 8 ChemCam calibration targets (in situ calibration on Sol 27 and 76) and adopt 225, 125 and 225 spectra from the 3 Martian natural targets (in situ detection on Sol 90, 130 and 133, the first 5 spectra at each sampling location are excluded as explained above). For the 3 Martian natural targets, Flaherty_2 can only contribute 125 LIBS spectra because it has five sampling positions, while either Portage or Gillespie_Lake_1 can contribute 225 spectra since each has nine sampling positions. It is noteworthy that the in situ spectra signals of the Macusanite sample are so weak that the ChemCam team has abandoned its spectra in practical analysis.⁵⁰ Therefore, in this study, we have decided to exclude the Macusanite sample too, and actually in situ spectra of 7 ChemCam calibration targets are employed. Hence, there are totally 750 LIBS spectra in the target domain dataset, and the relevant information about the target domain samples is displayed in Table 3, with the original data available at [https://pds-geosciences.wustl.edu/msl/msl-m-chemcam-libs-4_5-rdr-v1/mslccm_1xxx/].

Table 3 Information about the target domain dataset, including sample names, data file names, and the number of LIBS spectra

Sample	Name	Data file name	No. of spectra
Cal. Target 2	Norite	cl5_399890138ccs_f0030530ccam01027p3.csv	25
Cal. Target 3	Picrite	cl5_399889851ccs_f0030530ccam01027p3.csv	25
Cal. Target 4	Shergottite	cl5_399889564ccs_f0030530ccam01027p3.csv	25
Cal. Target 6	KGa-2med-S	cl5_399888959ccs_f0030530ccam01027p3.csv	25
Cal. Target 7	NAu-2lo-S	cl5_399888672ccs_f0030530ccam01027p3.csv	25
Cal. Target 8	NAu-2med-S	cl5_399888385ccs_f0030530ccam01027p3.csv	25
Cal. Target 9	NAu-2hi-S	cl5_399888098ccs_f0030530ccam01027p3.csv	25
Martian soil	Portage	cl5_405468981ccs_f0050104ccam02089p3.csv	225
		cl5_405469061ccs_f0050104ccam02089p3.csv
		cl5_405469136ccs_f0050104ccam02089p3.csv
		cl5_405469251ccs_f0050104ccam02089p3.csv
		cl5_405469326ccs_f0050104ccam02089p3.csv
		cl5_405469508ccs_f0050104ccam02089p3.csv
		cl5_405469623ccs_f0050104ccam02089p3.csv
		cl5_405469699ccs_f0050104ccam02089p3.csv
		cl5_405469774ccs_f0050104ccam02089p3.csv
Martian rock I	Flaherty_2	cl5_409030344ccs_f0051576ccam01130p3.csv	125
		cl5_409030421ccs_f0051576ccam01130p3.csv
		cl5_409030489ccs_f0051576ccam01130p3.csv
		cl5_409030559ccs_f0051576ccam01130p3.csv
		cl5_409030628ccs_f0051576ccam01130p3.csv
Martian rock II	Gillespie_Lake_1	cl5_409283937ccs_f0051662ccam01132p3.csv	225
		cl5_409284008ccs_f0051662ccam01132p3.csv
		cl5_409284076ccs_f0051662ccam01132p3.csv
		cl5_409284188ccs_f0051662ccam01132p3.csv
		cl5_409284357ccs_f0051662ccam01132p3.csv
		cl5_409284559ccs_f0051662ccam01132p3.csv
		cl5_409284670ccs_f0051662ccam01132p3.csv
		cl5_409284839ccs_f0051662ccam01132p3.csv
		cl5_409284906ccs_f0051662ccam01132p3.csv

The 175 spectra of the 7 ChemCam calibration targets, which possess strictly real concentration value labels, would be utilized for retraining the source domain CNN model and thereby obtaining the target domain CNN model, while the 575 spectra of the 3 Martian natural targets, which only possess reference “real concentration value” labels (APXS-measured), would be utilized for testing the performance of the target domain CNN model.

The concentration label of each spectrum shows the concentration values of the eight chemical components in the corresponding sample, which can be expressed by a 1 × L vector C, with each value in the vector standing for the concentration c of a certain component, as shown in eqn (1)


C_i = [c_i1, c_i2, …, c_iL], i = 1, 2, …, N	(1)

where i denotes the sample no and L the total number of chemical components. In this study, N = 332 for the source domain dataset, N = 10 for the target domain dataset, and L = 8 for either dataset. Herein, such a label vector is named a component concentration vector (CCV), with the component order in each CCV being SiO₂, TiO₂, Al₂O₃, FeO_T, MgO, CaO, Na₂O, and K₂O.

3. Deep learning and transfer learning methods

3.1 Construction of the source domain CNN model

As in our previous work,⁴⁶ the source domain CNN model for LIBS quantification is constructed by utilizing the TensorFlow framework and Keras Sequential model based on Python 3.8.12. The deep CNN architecture is designed as follows.

Layer 1: Batch normalization layer.

Layer 2: Convolutional layer. The activation function is ReLU (Rectified Linear Unit), which can be expressed by eqn (2)


	(2)

Layer 3: Pooling layer. The pooling mode is max-pooling.

Layer 4: Convolutional layer. The activation function is ReLU.

Layer 5: Pooling layer. The pooling mode is max-pooling.

Layer 6: Convolutional layer. The activation function is ReLU.

Layer 7: Convolutional layer. The activation function is ReLU.

Layer 8: Pooling layer. The pooling mode is max-pooling.

Layer 9: Convolutional layer. The activation function is ReLU.

Layer 10: Flatten layer.

Layer 11: Dense layer. The activation function is ReLU.

Layer 12: Dropout layer.

Layer 13: Dense layer. The activation function is a sigmoid function, as expressed by eqn (3)


	(3)

More information about the CNN model hyperparameters is displayed in Table 4. The basic operation mode and the working mechanism of the CNN model can be found in ref. 46 and would not be expounded here.

Table 4 Information about the source domain CNN model hyperparameters

Layer	Layer hyperparameters		Trainable
Batch_Normalization	Input dimension	(5484, 1, 1)	False
Convolution_1	Kernel size	(5, 1)	False
	Stride	(2, 1)
	Number of filters	8
Max_Pooling_1	Kernel size	(2, 1)	False
Convolution_2	Kernel size	(5, 1)	False
	Stride	(2, 1)
	Number of filters	32
Max_Pooling_2	Kernel size	(2, 1)	False
Convolution_3	Kernel size	(5, 1)	True
	Stride	(2, 1)
	Number of filters	128
Convolution_4	Kernel size	(5, 1)	True
	Stride	(2, 1)
	Number of filters	256
Max_Pooling_3	Kernel size	(2, 1)	True
Convolution_5	Kernel size	(5, 1)	True
	Stride	(2, 1)
	Number of filters	512
Flatten	—	—	True
Dense_1	Input dimension	11264	True
Dense_1	Output dimension	1024	True
Dropout	Dropout rate	0.2	True
Dense_2	Input dimension	1024	True
Dense_2	Output dimension	8	True

3.2 Training, validation and testing of the source domain CNN model

Before the operation of the CNN model, the source domain LIBS spectral dataset is divided into three sets, namely a training set, a validation set, and a testing set, hence the renowned training-validation-testing strategy can be implemented. As mentioned before, in the source domain, there exist 8 special samples, i.e. the duplicate samples of the ChemCam calibration targets. Recalling that 180 LIBS spectra are employed for each source domain sample, the 1440 spectra of these 8 special samples are used as the source domain testing set. Among the other 324 source domain samples, we randomly choose 155 spectra from each sample for training, and the remaining 25 spectra for validation. Hence, there are 50220 and 8100 spectra in the source domain training set and validation set, respectively.

The training set data (50220 spectra and the corresponding CCV labels) are utilized to train the source domain CNN model, i.e. update the network weight parameters (including biases, the same below) through iterations. The optimizer of the CNN adopts the Adam algorithm, which is able to dynamically adjust the learning rate of every individual weight parameter based on adaptive moment estimation. During the iteration process, the model quantification performance on the whole validation set data (8100 spectra and the corresponding CCV labels) is employed as an efficient feedback guidance for model optimization. The model performance can be evaluated using root mean square error (RMSE), with the lower RMSE representing the better performance. The calculation of the RMSE values will be described in section 3.5.

Based on a certain set of model hyperparameters, the training-validation process can yield one optimized model. When changing one or more hyperparameters, another optimized model would be acquired. The testing set is used to further select the “best” model among these optimized models. To be more specific, we have selected 4 adjustable hyperparameters to be tuned, i.e. batch size, initial learning rate, number of epochs and output epoch interval. And for each hyperparameter, we have set 4 trial values empirically. Hence, there are 4⁴ = 256 possible hyperparameter combination sets, and 256 optimized models can be acquired. Among these 256 optimized models, the model which can achieve the lowest RMSE value for the testing set data (1440 spectra and the corresponding CCV labels) is denoted as the “best model”, and this “best model” is chosen as the final source domain model. This CNN model, i.e. the so-called pretrained model, is also the starting point of the subsequent transfer learning. Information about the dataset partitioning in the training, validation and testing of the source domain CNN model is listed in Table 5.

Table 5 Information about dataset partitioning in the training, validation and testing of the source domain CNN model

	No. of samples	No. of spectra	Description
Training set	324	50220	• Spectra from the 324 regular laboratory samples, random training/validation partition
Training set		50220	• To train the source domain CNN model
Validation set		8100	• Spectra from the 324 regular laboratory samples, random training/validation partition
Validation set		8100	• To optimize the source domain CNN model based on validation result feedback
Testing set	8	1440	• Spectra from the 8 special laboratory samples, i.e. the duplicate samples of the ChemCam calibration targets
Testing set	8	1440	• To test the prediction performance (generalizability) of the source domain CNN model
Results	Construct a CNN model, complete the training and optimization, and acquire a high-performance source domain pretrained model
Performance evaluation	Compared with the real composition measured using other laboratory analytical techniques

3.3 Transfer learning scheme

As introduced before, this study adopts a PMTL pattern. By tuning the source domain CNN pretrained model, we aim to achieve a proper target domain CNN model competent for analyzing the Mars in situ ChemCam data.

The method for tuning the pretrained model is to freeze part of the layers in the CNN while retraining the remaining layers. Specifically, there are totally 13 layers in the CNN model herein, and we keep all the weight parameters in the first 5 layers unchanged while retraining the weight parameters in the last 8 layers, as illustrated in Fig. 5. The freezing/retraining strategy information can also be traced in the last column of Table 4 (“False” for freezing and “True” for retraining).


	Fig. 5 Structure of the deep CNN model and the PMTL pattern. Among the 13 layers of the CNN, the first 5 layers are freezing layers (indicated by the light purple background), while the last 8 layers are retraining layers.

The retraining of the CNN model needs to make use of the target domain data (175 spectra), as stated below. Note that due to the spectral data similarity between the two domains, as well as the excellent performance of the pretrained model on the source domain data, such a retraining would need observably smaller training set data size and less number of iteration steps than conventional from-scratch training.

3.4 Training and testing of the target domain CNN model

The effectiveness of the target domain CNN model obtained via our transfer learning scheme would be demonstrated in two parts. In the first part, the spectra of the 7 ChemCam calibration targets are utilized to carry out the training and testing process based on a leave-one-out strategy. Moreover, four alternative schemes which adopt regular deep learning but do not involve transfer learning would be compared with the transfer learning scheme in terms of prediction performance. In the second part, the target domain CNN model is trained by all the spectra of the 7 ChemCam calibration targets, and then this CNN is tested upon the spectra of the 3 Martian natural targets. The APXS-measured concentrations are employed as the reference “real CCV” labels. In addition, the oxide concentration values calculated by our transfer learning scheme would be compared with the ChemCam team's calculation results, which is officially called Multivariate Oxide Composition (MOC) [https://pds-geosciences.wustl.edu/msl/msl-m-chemcam-libs-4_5-rdr-v1/mslccm_1xxx/data/moc/]. For simplicity, we would denote the method by which MOC results are calculated as “MOC method” in the following text.

3.4.1 Testing part-I: testing of ChemCam calibration target spectra. In order to obtain an appropriate CNN model for analyzing the target domain data, we need to use the target domain data to retrain the source domain CNN model. In this part, the target domain data to be analyzed refer to the in situ spectra of the ChemCam calibration targets (onboard calibration on Mars). The leave-one-out strategy is adopted for the training and testing. It is worth emphasizing that this strategy herein refers to “leave one sample out” instead of “leave one spectrum out”. To be more specific, among the 7 ChemCam calibration target samples, we select 6 samples (150 spectra and the corresponding CCV labels) as the target domain training set to retrain the source domain model, and the remaining sample (25 spectra and the corresponding CCV labels) is used as the target domain testing set to check the performance of the acquired retrained model. Such a process is regarded as one testing round, and there are 7 testing rounds in all, since each sample should play as the testing set sample once. Note that we do not assign an independent validation set in the target domain, and the retraining process only utilizes the training set data. Information about the dataset partitioning in the training and testing of the target domain CNN model (Part-I) is listed in Table 6.

Table 6 Information about the dataset partitioning in the training and testing of the target domain CNN model (Part-I)

	No. of samples	No. of spectra	Description
Training set	6	150	• Spectra from 6 of the 7 ChemCam calibration targets, “leave-one-out” partition (each sample in turn)
Training set	6	150	• To retrain the target domain CNN model
Testing set	1	25	• Spectra from 1 of the 7 ChemCam calibration targets, “leave-one-out” partition (each sample in turn)
Testing set	1	25	• To test the prediction performance (generalizability) of the target domain CNN model
Results	Complete the retraining and optimization of the source domain CNN model and acquire a high-performance target domain CNN model
Performance evaluation	Compared with the real composition measured using other laboratory analytical techniques

As mentioned above, we have tried four alternative schemes, which utilize regular deep learning without transfer learning, to analyze the ChemCam in situ data. The first alternative scheme (called “Scheme AT1”) is to directly employ the source domain CNN model. In other words, there is no retraining process for model tuning. The second alternative scheme (called “Scheme AT2”) is to use the small amount of target domain data to train a CNN model from scratch. That is to say, the large amount of source domain data and the source domain pretrained model are not utilized at all. For Scheme AT2, the training-testing mode is the same as that of the transfer learning scheme, namely the leave-one-out strategy. The third alternative scheme (called “Scheme AT3”) is to train a CNN model using both the source domain and target domain datasets simultaneously. Specifically, in Scheme AT3, the whole dataset comprises the spectra from 331 target samples, including 324 Earth laboratory samples and 7 ChemCam calibration targets (hence realizing the simultaneous utilization of the source domain and target domain data). We adopt the “leave-one-out” strategy for the testing of the 7 ChemCam calibration targets. That is to say, there are 330 training/validation set samples (324 Earth laboratory samples plus 6 ChemCam calibration targets) and one testing set sample, with the testing set sample selected one by one from the 7 ChemCam calibration targets. Note that Scheme AT3 is special among the four alternative schemes since it has a validation set. The fourth alternative scheme (called “Scheme AT4”) is to use a conventional data transfer method. It is very similar to Scheme AT1, and the only difference is that the data transfer method is additionally employed in Scheme AT4. Specifically, the model is still the source domain CNN model, while the testing set spectra are transferred into “Earth laboratory corrected spectra” before the testing. The Mars-to-Earth data transfer is carried out based on the data conversion matrix provided by the ChemCam team.⁵⁰

The quantification performance superiority of Scheme TL over the four alternative schemes would be demonstrated in section 4.2.1.

3.4.2 Testing part-II: testing of Martian natural target spectra. In this part, the fundamental mode is similar to that of the first part, while the target domain data to be analyzed become the Martian natural targets spectra. Herein, all the 7 ChemCam calibration target samples (175 spectra and the corresponding CCV labels) are employed as the target domain training set to retrain the source domain CNN model. The 3 Martian natural target samples (1 soil sample and 2 rock samples, 575 spectra) are employed as the target domain testing set to evaluate the performance of the acquired target domain CNN model. Information about the dataset partitioning in the training and testing of the target domain CNN model (Part-II) is listed in Table 7.

Table 7 Information about the dataset partitioning in the training and testing of the target domain CNN model (Part-II)

	No. of samples	No. of spectra	Description
Training set	7	175	• Spectra from all the 7 ChemCam calibration targets
Training set	7	175	• To retrain the target domain CNN model
Testing set	3	575	• Spectra from 3 Martian natural targets
Testing set	3	575	• To test the prediction performance (generalizability) of the target domain CNN model
Results	Complete the retraining and optimization of the source domain CNN model and acquire a high-performance target domain CNN model
Performance evaluation	• Compared with the “reference” composition measured using the APXS technique on Curiosity rover
Performance evaluation	• Compared with the “reference” composition calculated by the ChemCam team, i.e. the MOC results

Since the Martian natural target samples do not have strictly real CCV labels, the APXS-measured concentrations are regarded as the reference “real CCV” labels for prediction performance evaluation. As mentioned above, the ChemCam team has provided their composition prediction results called MOC. The so-called MOC method mainly comprises two algorithms, i.e. the partial least squares 1-submodel (PLS1-SM) and independent component analysis (ICA). Further introduction of the MOC method would be offered in section 5.2.2. The comparison between the Scheme TL results and the MOC results would be shown in section 4.2.2.

3.5 CNN model performance evaluation

As mentioned before, the RMSE is used to evaluate the performance of the CNN model, either for the source domain or the target domain. Suppose we have chosen a certain sample as the testing sample and collected M spectra from it, its real CCV contains L components and is denoted as R, and the predicted CCV calculated from the jth spectrum is denoted as P_j, with the lth component in each of them denoted as R_l and P_jl, respectively. Then, the RMSE of this testing case can be calculated using eqn (4)


	(4)

In order to inspect the model prediction performance in a more meticulous way, we calculate the component-wise RMSE in addition to the overall RMSE described above. Taking the lth component in the testing sample as an example, we can calculate its component-wise RMSE based on eqn (5)


	(5)

where the superscript CW represents “component-wise” and the other symbols have the same meaning as in eqn (4). Note that for either RMSE or component-wise RMSE, the physical unit is the same as that of the component concentration, i.e. weight percentage (wt%) in this study.

Although it is able to indicate accuracy, the RMSE indicator has certain shortcomings, as it is naturally lower when the real concentration value is lower, even if the actual error is very high. Therefore, in order to comprehensively evaluate the accuracy, we also adopt relative error (RE) as an evaluation indicator. Similar to component-wise RMSE, the component-wise RE of the lth component can be calculated using eqn (6)


	(6)

In addition, the overall RE (or simplified as RE) of each testing sample can be calculated by averaging the L component-wise RE values of this sample.

4. Results

4.1 Performance of the source domain CNN model

According to the training-validation-testing strategy stated in section 3.2, the optimal source domain CNN model can be achieved via the iteration of the weight parameters within each set of model hyperparameters along with the selection of the best set of hyperparameters. Recall that the optimal source domain CNN model refers to the model capable of obtaining the lowest RMSE on the source domain testing set, namely the 8 duplicate samples of the ChemCam calibration targets (1440 spectra). The evolution curves of the training RMSE and validation RMSE of this optimal CNN model in the iteration process are displayed in Fig. 6.


	Fig. 6 The evolution curves of the training RMSE (blue line) and validation RMSE (red line) of the optimal source domain CNN model. (a) The overall view showing the whole iteration process (steps 1 to 500). (b) The local zoom-in view showing the latter part of the iteration process (steps 200 to 500).

As shown in Fig. 6(a), both the training RMSE and the validation RMSE have an overall trend of gradual decline, implying that there is no anomaly throughout the whole process. From Fig. 6(b), one may find that both the training RMSE and the validation RMSE can reach very small values (as low as 0.80 wt% and 0.85 wt%, respectively), indicating that there appears no underfitting situation; meanwhile, the validation RMSE is a bit higher than the training RMSE, indicating that there is no overfitting situation.

With regard to the performance of the above optimal CNN model in the testing, the overall average RMSE upon the source domain testing set is 4.88 wt%. In a more detailed way, we have examined the RMSE of each of the 8 testing set samples, and the results are illustrated in Fig. 7(a). The maximum testing RMSE of a single sample is 9.96 wt% (KGa-2med-S), while the minimum testing RMSE of a single sample is 1.52 wt% (Shergottite). The mean and median of the 8 testing RMSE values are 4.14 wt% and 3.27 wt%, respectively. It is not difficult to find that the general error level of the 4 glass samples (left half, blue bars) is lower than that of the 4 ceramic samples (right half, green bars). The possible reason is that the ceramic samples have considerable heterogeneities at the level of the laser beam diameter of the ChemCam LIBS system.^50,66


	Fig. 7 The testing RMSE information about the optimal source domain CNN model. (a) The 8 RMSE values corresponding to the 8 source domain testing set samples, with the left 4 being glass samples (blue bars) and the right 4 being ceramic samples (green bars). (b) The 8 component-wise RMSE values corresponding to the 8 inspected components (average over the 8 samples).

Besides the overall RMSE, we have also calculated the component-wise RMSE values for each inspected component (averaging over the 8 samples), as displayed in Fig. 7(b). The maximum testing RMSE of a single component is slightly below 7 wt%, while the minimum testing RMSE of a single component can be less than 0.4 wt%. The mean and median of the 8 testing component-wise RMSE values are 2.73 wt% and 2.21 wt%, respectively. Paying particular attention to the three key components in TAS (Total Alkali Silica) classification, we can find that the RMSE of SiO₂ is below 7 wt%, and the RMSE of either Na₂O or K₂O is even well below the 1 wt% level, implying that the CNN model is promising for TAS classification in the future (currently only the RMSE of SiO₂ has not yet reached the TAS error requirement, i.e. ≤4 wt%).

As mentioned above, in addition to the RMSE values, the overall RE and component-wise RE values are also employed as the evaluation indicators of the model performance. Information about the overall RE for each of the 8 testing set samples is listed in Table 8. For the optimal source domain CNN model, the median RE value is 56.42%; the minimum RE value is 30.64% (NAu-2med-S), while the maximum RE value is as high as 1256.23% (Macusanite). Information about component-wise RE for each inspected component (averaging over the 8 samples) is displayed in Table 9. The median component-wise RE value is 74.68%, the minimum component-wise RE value is 16.5% (SiO₂), while the maximum component-wise RE value can reach as high as 1127.20% (MgO). The reason for the high RE value of Macusanite is that the real concentration values of TiO₂, FeO_T, MgO, and CaO in this sample are less than 1 wt%, and even small prediction errors can make the RE value very high. In particular, the component-wise RE of MgO in Macusanite is extraordinarily high, up to 8731.53%, which is an extreme outlier among all the component-wise RE values. This outlier value makes the component-wise RE of MgO significantly higher than the component-wise RE values of other components.

Table 8 RE values of the testing set of the source domain CNN model

RE%	Shergottite	Picrite	Macusanite	Norite
Testing	60.51	58.05	1256.23	54.79

RE%	NAu-2hi-s	NAu-2lo-S	NAu-2med-S	KGa-2med-S
Testing	51.47	32.44	30.64	58.66

Table 9 Component-wise RE values of the testing set of the source domain CNN model

RE%	SiO₂	TiO₂	Al₂O₃	FeO_T
Testing	16.52	149.72	31.86	76.36

RE%	MgO	CaO	Na₂O	K₂O
Testing	1127.20	73.00	35.11	93.01

Despite the few extremely high RE values, the above results demonstrate that the finally determined source domain CNN model can generally achieve good accuracy for the testing set (although far from ideal). The generally good performance of the source domain CNN model makes us believe that this pretrained model has potential to behave well on the target domain dataset after proper model retraining.

4.2 Performance of the target domain CNN model for ChemCam calibration targets

As described above, the pretrained source domain CNN model can be used to analyze the target domain data after transfer learning, i.e. a proper retraining of the model, and the new model is regarded as the target domain CNN model.

4.2.1 Testing performance of ChemCam calibration target spectra. In this part, the training and testing processes are carried out in the leave-one-out mode. As 7 ChemCam calibration target samples are involved, we have conducted 7 testing rounds. It is worth mentioning that the target domain CNN model would vary with the testing round, since the training set samples in different testing rounds are not identical. As mentioned in section 3.4.1, besides the transfer learning scheme, four alternative schemes have been tried for comparison, denoted as Scheme TL, Scheme AT1, Scheme AT2, Scheme AT3 and Scheme AT4, respectively. We compare Scheme TL and the four alternative schemes by the same four indicators, namely RMSE, component-wise RMSE, RE and component-wise RE.

The testing RMSE values in the 7 testing rounds of the five schemes are displayed in Fig. 8(a). For a better illustration of the performance comparison, three background colors are used to indicate the performance ranking of the five schemes, with green standing for the best (meanwhile bold magenta font), blue for the medium, and pink for the last. For example, in the testing round, when Norite is adopted as the testing set sample, the RMSE value of Scheme TL is 1.03 wt%, while the RMSE values of Scheme AT1, Scheme AT2, Scheme AT3 and Scheme AT4 are 8.51 wt%, 1.19 wt%, 6.87 wt% and 8.07 wt%, respectively. Hence, it is apparent that Scheme TL is the best, while Scheme AT1 ranks the last. It is easy to find that in all the 7 testing rounds, Scheme TL can achieve the best performance among the five schemes. Moreover, the maximum RMSE of Scheme TL is below 6 wt% (5.57 wt% for KGa-2med-S), while the maximum RMSE values of Scheme AT1, Scheme AT2, Scheme AT3 and Scheme AT4 are 10.13 wt%, 8.19 wt%, 7.30 wt% and 9.78 wt%, respectively.


	Fig. 8 The testing RMSE information about the target domain CNN model for five schemes, i.e. Scheme TL, Scheme AT1, Scheme AT2, Scheme AT3 and Scheme AT4. (a) The 7 RMSE values corresponding to the 7 testing rounds, with the sample name standing for the testing set sample. (b) The 8 component-wise RMSE values corresponding to the 8 inspected components (averaging over the 7 testing rounds). (c) The 7 RE values corresponding to the 7 testing rounds, with the sample name standing for the testing set sample. (d) The 8 component-wise RE values corresponding to the 8 inspected components (averaging over the 7 testing rounds). Three background colors are used to indicate the performance ranking of the five schemes, with green standing for the best (meanwhile bold magenta font), blue for the medium, and pink for the last.

As in section 4.1, we have examined the component-wise RMSE values for each inspected component (averaging over the 7 testing samples). As shown in Fig. 8(b), Scheme TL can gain the lowest component-wise RMSE for all the eight components except K₂O (Scheme TL ranking 3rd for K₂O). Additionally, for all the eight components, the maximum component-wise RMSE value of Scheme TL is below 3.5 wt% (3.43 wt% for SiO₂), while the maximum component-wise RMSE values of Scheme AT1, Scheme AT2, Scheme AT3 and Scheme AT4 are 19.15 wt%, 4.61 wt%, 10.47 wt% and 17.69 wt%, respectively.

The testing RE values in the 7 testing rounds are shown in Fig. 8(c). Scheme TL can achieve the lowest RE in 4 out of the 7 testing rounds. Moreover, for the 7 testing rounds, the mean RE value of Scheme TL is 64.35%, while the mean RE values of Scheme AT1, Scheme AT2, Scheme AT3 and Scheme AT4 are 276.36%, 64.96%, 116.41% and 63.45%, respectively. Only Scheme AT4 is slightly better than Scheme TL regarding the mean RE value.

Meanwhile, the component-wise RE values are displayed in Fig. 8(d). Scheme TL achieves the lowest component-wise RE for three components, i.e. SiO₂, Al₂O₃ and CaO, and ranks 2nd for three other components, i.e. TiO₂, FeO_T and K₂O. The mean component-wise RE value of Scheme TL is 64.35%, while the mean component-wise RE values of Scheme AT1, Scheme AT2, Scheme AT3 and Scheme AT4 are 102.84%, 64.96%, 116.41% and 63.45%, respectively. Only Scheme AT4 is slightly better than Scheme TL regarding the mean component-wise RE value.

Generally speaking, the testing results presented in Fig. 8 can demonstrate well the accuracy superiority of the transfer learning scheme over those schemes only adopting deep learning.

In addition to the performance regarding accuracy, we have also looked into the performance regarding training efficiency. Specifically, we record the training RMSE evolution for Scheme TL and Scheme AT2 in the process of acquiring the target domain CNN model (there is no such training process for Scheme AT1). Taking the case of testing KGa-2med-S as an example, the evolution curves of the training RMSE for the two schemes are shown in Fig. 9.


	Fig. 9 The evolution curves of the training RMSE for Scheme TL (red line) and Scheme AT2 (blue line) in the process of acquiring the target domain CNN model and in the testing case when KGa-2med-S is used as the testing sample. The training RMSE values at iteration step 1 and step 31 are specially marked. For Scheme AT2, the dashed line after iteration step 31 implies that the optimal model would be achieved after further more iterations.

The initial training RMSE (i.e. RMSE at iteration step 1) of Scheme TL is 12.82 wt%, apparently lower than that of Scheme AT2, which is as high as 18.38 wt%. This means that the source domain pretrained CNN model really has a better “knowledge foundation” than the randomly initialized new CNN model. After 30 iterations (i.e. at iteration step 31), Scheme TL can already achieve an optimal model, with the training RMSE dropping to below 2.5 wt%, while Scheme AT2 still has a training RMSE over 3 wt%. If we further extend the iteration for Scheme AT2, the training RMSE would go on decreasing but in a very sluggish way, and an optimal model can be obtained at iteration step 151, with a training RMSE of 2.69 wt%. Even though the iteration step number of Scheme AT2 has reached five times that of Scheme TL, the training RMSE of Scheme AT2 is still higher than that of Scheme TL. This is consistent with the fact that for KGa-2med-S the testing RMSE of Scheme AT2 is higher than that of Scheme TL (as already shown in Fig. 8).

It is noteworthy that for the KGa-2med-S testing results displayed in Fig. 9, the batch sizes adopted in the two schemes are identical (both are 50). However, in most of the other testing round cases, the batch size of Scheme AT2 is larger than that of Scheme TL. Information about the two hyperparameters of the optimal model in each scheme, namely the iteration step number and batch size, is provided in Table 10.

Table 10 Information about the iteration step number (abbreviated as step) and batch size (abbreviated as BS) for Scheme TL and Scheme AT2 in the 7 testing round cases

Target name	Step TL	Step AT2	BS TL	BS AT2
Norite	51	151	100	50
Picrite	31	151	75	100
Shergottite	31	31	50	100
KGa-2med-S	31	151	50	50
NAu-2lo-S	51	201	75	50
NAu-2med-S	101	201	50	75
NAu-2hi-S	101	151	50	75

It can be seen that in all the testing cases, the iteration step number of the optimal model for Scheme AT2 is no less than the Scheme TL counterpart, and in 5 of the 7 testing cases the batch size of the optimal model for Scheme AT2 is no smaller than the Scheme TL counterpart. Considering the fact that a larger batch size in the training means a longer time within each iteration step, the results in Table 10 indicate that Scheme AT2 has a lower training efficiency than Scheme TL. Therefore, the performance of our transfer learning scheme is stronger than that of the alternative scheme in terms of both prediction accuracy and training efficiency.

4.2.2 Testing performance of Martian natural target spectra. In this part, all the 175 LIBS spectra from the 7 ChemCam calibration target samples are utilized to retrain the source domain CNN model, and the retrained model is used to predict the 575 LIBS spectra from the 3 Martian natural samples, i.e. Portage (nine sampling positions), Flaherty_2 (five sampling positions), and Gillespie_Lake_1 (nine sampling positions). As mentioned in section 3.4.2, the MOC results calculated by the ChemCam team are employed as a comparison, and the APXS-measured results are regarded as the reference “real values”. Note that for each of these 3 Martian samples, the APXS only measures at one sampling position (unlike the multi-position mode in LIBS) and hence yields just one CCV, based on an assumption that each sample is homogeneous rather than heterogeneous (at least within the sampling area). According to the official summary information for ChemCam Data Record products [https://pds-geosciences.wustl.edu/msl/msl-m-chemcam-libs-4_5-rdr-v1/mslccm_1xxx/document/msl_ccam_obs.csv], these 3 samples are indeed not labeled as “heterogeneity”, and hence the above assumption can be regarded as reasonable.

Upon the premise that the APXS-measured CCVs are “real values”, the 8 component-wise RMSE values (average over the 3 Martian samples) of the Scheme TL and MOC results are calculated, as displayed in Table 11. The mean of the 8 component-wise RMSE values of the Scheme TL results is 1.80 wt%, while that of the MOC results is 1.50 wt%. Meanwhile, the median of the 8 component-wise RMSE values of the Scheme TL and MOC results are 1.30 wt% and 1.54 wt%, respectively. These results indicate that the quantification accuracy of our transfer learning scheme can be well close to that of the well-designed ChemCam scheme when predicting the CCVs of Martian natural samples.

Table 11 The 8 component-wise RMSE values (average over the 3 Martian natural samples) of the Scheme TL results and the MOC results (in units of wt%)

	SiO₂	TiO₂	Al₂O₃	FeO_T	MgO	CaO	Na₂O	K₂O
TL	2.45	0.68	2.94	4.95	1.57	1.03	0.50	0.22
MOC	2.57	0.49	2.34	2.55	2.08	1.01	0.75	0.20

In Fig. 10, we illustrate the concentration values of two representative components in the Martian natural samples, i.e. SiO₂ (Fig. 10(a)–(c)) and Na₂O (Fig. 10(d)–(f)).


	Fig. 10 The concentration value comparison of Scheme TL (purple triangles), MOC (blue circles), and APXS (red pentagrams with dashed lines) for two components, i.e. SiO₂ and Na₂O. (a) SiO₂ in Portage. (b) SiO₂ in Flaherty_2. (c) SiO₂ in Gillespie_Lake_1. (d) Na₂O in Portage. (e) Na₂O in Flaherty_2. (f) Na₂O in Gillespie_Lake_1. For Scheme TL and MOC results, the mean values and standard deviation values are stated in the main text.

For SiO₂, the APXS-measured concentration values of the 3 samples are 43.7 wt%, 41.9 wt% and 45.74 wt%, respectively. The mean and standard deviation values of the MOC results (μ ± σ) are 44.91 ± 2.02 wt%, 44.70 ± 1.04 wt% and 45.70 ± 2.67 wt%, respectively, while those of the Scheme TL results are 46.21 ± 1.20 wt%, 45.68 ± 0.33 wt% and 45.64 ± 0.85 wt%, respectively. Generally speaking, the results provided by the three different methods are quite close, and the Scheme TL results show a higher stability than the MOC results (indicated by the smaller standard deviations in all the 3 samples).

For Na₂O, the APXS-measured concentration values of the 3 samples are 2.22 wt%, 2.15 wt% and 2.22 wt%, respectively. The statistics of the MOC results are 2.55 ± 0.52 wt%, 3.01 ± 0.05 wt% and 2.88 ± 0.45 wt%, respectively, while those of the Scheme TL results are 2.37 ± 0.38 wt%, 2.62 ± 0.17 wt% and 1.68 ± 0.35 wt%, respectively. It can be found that most Scheme TL predicted values are lower than the MOC values, and closer to the APXS values (especially obvious in Fig. 10(e)). In general, the Scheme TL results show a higher stability than the MOC results (indicated by the smaller standard deviations in 2 of the 3 samples).

For both SiO₂ and Na₂O, there exist some cases in which the Scheme TL and the MOC results are almost equal to each other (e.g.Fig. 10(a) Position 4, Fig. 10(b) Positions 1 and 2, and Fig. 10(d) Position 9). What is more, in a few cases, the Scheme TL and the MOC results are meanwhile very close to the APXS results (e.g.Fig. 10(c) Positions 1 and 5 and Fig. 10(d) Position 3). Meanwhile, for the Scheme TL and the MOC results, the fluctuation trends of predicted values changing with the shot position are generally consistent.

In order to further provide an overall statistics of the MOC and Scheme TL prediction results for all the eight components, we display the mean and standard deviation information in Fig. 11, in the form of a center point/error bar (μ ± σ). It is intuitive that in most cases the mean values are quite close to the APXS reference values for either MOC or Scheme TL, and meanwhile the MOC results have larger fluctuations than the Scheme TL results (e.g.Fig. 11(c) and (f), SiO₂ and MgO). In fact, the average of the σ values for the MOC results is 0.92 wt%, while the counterpart for the Scheme TL results is 0.72 wt%; the median of the σ values for the MOC results is 0.58 wt%, while the counterpart for the Scheme TL results is 0.39 wt%, indicating the generally higher stability of the Scheme TL results, despite that there are a few exceptions in certain components such as FeO_T. It is worth recalling that the discussion about stability herein is based on the important homogeneity assumption mentioned above.


	Fig. 11 The mean and standard deviation resultsof MOC and Scheme TL prediction results for all the eight components. The mean values are denoted by red crosses and the standard deviation values are denoted by error bars. The APXS results are indicated by blue circles. (a) MOC for Portage. (b) MOC for Flaherty_2. (c) MOC for Gillespie_Lake_1. (d) Scheme TL for Portage. (e) Scheme TL for Flaherty_2. (f) Scheme TL for Gillespie_Lake_1. In some cases, the error bars cannot be clearly seen since the standard deviation values are too small.

Although the APXS-measured concentrations may not be strictly real values, the results shown above imply that our scheme (based on deep learning and transfer learning) has great potential to offer good composition prediction on Martian natural target samples, just like the elaborate scheme designed by the ChemCam team (based on PLS1-SM and ICA).

5. Discussion

In this section, we would further analyze and discuss several important points and then provide some future prospects.

5.1 Further analysis of performance superiority of transfer learning

5.1.1 More examination of the source domain model overfitting and generalizability. In section 4.1, we have demonstrated that there is neither underfitting nor overfitting in the source domain CNN model, based on the fact that the average training RMSE and average validation RMSE are both low and very close (both about 1 wt%). However, one may conjecture the existence of overfitting and doubt the model generalizability since the average testing RMSE is noticeably higher (4.88 wt%). Our further analysis is demonstrated below.

Firstly, during the training and validation of our source domain CNN model, we have ensured that the validation set is independent of the training set, i.e. the validation data are always “unseen data” for the model. Therefore, the very small discrepancy between the training RMSE and validation RMSE can, to a certain extent, reflect the low possibility of overfitting. Meanwhile, the validation RMSE is a bit higher than the training RMSE, implying that the result is quite reasonable.

Secondly, the source domain CNN model is believed to have good generalizability because the testing set samples themselves are really “challenging” samples. Recall that the testing samples are the 8 duplicate samples of the ChemCam calibration targets. The physical and chemical properties of these 8 samples determine that it is difficult to achieve high LIBS quantification accuracy on them: the ceramic targets (NAu and KGa series) have significant heterogeneities at the level of the beam diameter, and the glass targets (Shergottite, Picrite, Norite, and Macusanite) seem to exhibit chemical matrix effects due to the fact that they are vitreous instead of mineralic, affecting their chemical bonding and optical coupling to the laser.⁵⁰ In fact, two samples of them, namely KGa-2-med-S and Macusanite, were totally excluded in the quantification work of the ChemCam team, and the two samples were also excluded in ref. 34. Besides, Macusanite was also excluded in ref. 55. Considering that we have used all the 8 “difficult” samples (especially the two “very difficult” samples) as the testing samples, our quantitative analysis itself is a highly challenging task. Therefore, an average testing RMSE of 4.88 wt% might not indicate the low performance of the source domain CNN model.

In fact, when comparing the average testing RMSE of our CNN model with those reported in three other studies,^34,50,55 it can be found that the average testing RMSE values in the four studies are at the same level, about 3 wt% (note the average is calculated only on those 6 samples that simultaneously appear in all the four studies). Moreover, upon two samples (i.e. Norite and Shergottite), our CNN model can even achieve the lowest RMSE. Therefore, the performance of our source domain CNN model is generally as good as other excellent groups in the international LIBS community.

5.1.2 Underlying principle of the PMTL method. Here, we would briefly expound the underlying principle of the PMTL method developed in this study. As illustrated in Fig. 2, the prerequisite for the success of transfer learning is that the source domain task and the target domain task have commonality or similarity. In this research, the commonality is that the task in either domain is to quantify the concentrations of eight components based on LIBS data, and the similarity is that the LIBS spectra are collected by similar instruments in similar environments. Such similarity can make the general profiles of the spectra from samples with identical composition in the two domains close to each other, e.g. the characteristic peaks of certain elements appear at the same pixel positions, despite the intensity value of each peak in the source domain may be quite different from the counterpart in the target domain.

In a paper which focuses on the explanation of CNN function principles,⁶⁷ it has been demonstrated that the convolutional layers at different positions in the deep CNN model extract features with different levels: front layers are responsible for extracting primary level features like edges and corners; middle layers mainly extract intermediate level features, namely weighted combinations of the primary features; while backend layers concentrate on high level features, which are usually abstract and strongly correlate with the real labels in the target task.

For the deep learning CNN model, the first several layers (i.e. front layers) are responsible for extracting the explicit concrete features in the LIBS data, such as the contour edge details and the relevant position information of the characteristic peaks, while the last several layers (i.e. backend layers) are responsible for learning the implicit abstract features and establishing the mapping relationship between the abstract features and the component concentration values. On the one hand, the concrete features are highly similar for the spectra in the two domains, and hence we choose to freeze the front layers (first 5 layers herein) of the pretrained CNN model. On the other hand, the mapping relationship can be considerably distinct for the spectra in the two domains, and hence we retrain the backend layers (last 8 layers herein) of the pretrained CNN model according to target domain data. Based on such a freezing/retraining strategy, the CNN model in our PMTL scheme can fully utilize the old “versatile” knowledge in the source domain and pertinently learn the new “specific” knowledge in the target domain. Therefore, the CNN model can well adapt to the data pattern in the target domain and achieve good prediction results when analyzing the ChemCam Mars in situ spectra.

5.1.3 Analysis of other alternative schemes. According to the results demonstrated in section 4.2.1, we can generally indicate that Scheme TL > Scheme AT2 > Scheme AT4 > Scheme AT3 > Scheme AT1 in terms of overall accuracy performance. Here, we would provide a bit more analyses of the four alternative schemes.

In most of the testing cases, Scheme AT1 ranks last among the five schemes. This is due to the fact that the CNN model in Scheme AT1 only possesses the spectral property knowledge in the source domain (ChemCam laboratory spectra) and has never learned any knowledge from the target domain (ChemCam Mars in situ spectra), while the spectral properties in the two domains could have non-ignorable discrepancies resulting from the differences in instrumental function and environmental conditions (as illustrated in Fig. 4).

Despite the relatively poor performance of Scheme AT1, it cannot be inferred that the knowledge in the source domain is useless. The general superiority of Scheme TL over Scheme AT2 can well indicate that the data patterns learned by the model from the large amount of source domain spectra play a valuable role in analyzing the target domain spectra. It is worth emphasizing that the source domain data are not only abundant in quantity but also rich in component diversity. Thanks to such diversity, Scheme TL can demonstrate a more remarkable advantage over Scheme AT2 when testing the 4 ceramic samples, compared with the cases when testing the 3 glass samples (note: the ceramic samples have more complicated matrices and higher heterogeneity than the glass samples⁵⁰).

As to Scheme AT3, although it simultaneously uses the source domain data and target domain data for model training, it does not show expected performance, merely better than Scheme AT1. The major possible reason is that the model may get “confused” when it tries to identify the feature patterns from the domain-mixed spectral data, since the feature patterns in the source domain can be different from those in the target domain.

Despite being quite similar to Scheme AT1, Scheme AT4 can achieve obviously better performance than Scheme AT1, just by adding a Mars-to-Earth data transfer procedure. The good performance of Scheme AT4 implies that the data transfer method can make positive effects, since it indeed reduces the dissimilarity between the spectral data properties in the two domains to a certain extent.

From these results, one may infer that a model which learns knowledge from both domains would be more competent than a model which only learns knowledge from a single domain.

5.2 Comparison between the transfer learning method and other methods

5.2.1 Comparison between transfer learning and data transfer. As mentioned in section 1 and section 5.1.3, data transfer is another major methodology to simultaneously utilize the Mars in situ spectra and the Earth laboratory spectra, besides our transfer learning methodology. For example, the ChemCam team has employed 6 ChemCam calibration target samples and their duplicate samples in the laboratory as the base of data conversion.⁵⁰ Specifically, the average spectrum of the 6 ChemCam calibration target samples and the counterpart of the 6 laboratory duplicate samples are calculated, and then a data conversion matrix (vector) is constructed based on the pixel-wise ratio values between the two average spectra. Via this matrix, Earth-to-Mars data conversion can be realized, hence allowing the utilization of the abundant laboratory data when analyzing the in situ data. Besides such a straightforward linear conversion strategy, nonlinear conversion methods have been tried too, such as deep extreme learning machine, with 6 common samples.⁵⁵

Unlike the data transfer method, which focuses on data-driven statistical alignment, the transfer learning method focuses on model-driven knowledge transfer. In the data transfer method, if there are not enough common samples as a transfer benchmark, the data alignment would not be statistically meaningful and hence result in a poor transfer effect. In particular, when the domain similarity is low, it may require a great many common samples, leading to considerable time and economic costs in sample preparation. By contrast, the transfer learning method only requires that there are abundant data in the source domain for model pretraining, and a few data in the target domain for model retraining. In principle, it can extract transferable features and realize knowledge transfer without any common sample. Even if the data similarity in the two domains is low, we only need to add the number of retraining layers. Even for the worst case, we just need to retrain all the layers, with no need to change the CNN model architecture. Therefore, the transfer learning method is advantageous in terms of less sample preparation work.

Another noteworthy aspect is the issue of instrument. In this work, the LIBS instruments in the two domains are almost identical (both built by the ChemCam team), so this issue is usually neglected. However, if the instruments in the two domains are quite different (e.g. transferring between ChemCam and MarSCoDe), the superiority of transfer learning over data transfer would be more prominent. In that case, the two LIBS instruments may differ in spectral resolution, whole spectral range, response function, etc. Hence, even for the same sample, the spectrum acquired by one instrument can be greatly distinct from that acquired by the other instrument, regarding the number of data points, the interval between two adjacent data points, the central wavelength corresponding to each data point, the overall spectral profile, and so forth. In this situation, the data transfer method can be hardly effective. Although the pretrained-model-based pattern would also be less useful, we can choose other better-suited transfer learning patterns, such as the feature-based pattern, which utilizes high-dimensional feature space regardless of specific data format. Therefore, the transfer learning method is advantageous in terms of better flexibility and wider applicability.

5.2.2 Comparison between transfer learning and the MOC method. As mentioned before, the MOC method is an elaborate scheme for oxide concentration quantification, and the MOC results are valued as a certain “standard” by the ChemCam team. Herein, we would first introduce the MOC method and then analyze the advantages of our transfer learning scheme.

The MOC method consists of two core algorithms, namely PLS1-SM and ICA, and the final prediction result of each oxide concentration is a weighted average of the PLS1-SM result and the ICA result.

(1) PLS1-SM method: To build and train the PLS1 model, 408 geochemical standard samples in the ChemCam laboratory database are employed (totally 2040 spectra). First, the 408 samples are divided into three groups according to the concentration of the oxide to be analyzed, denoted as “low”, “medium” and “high”, respectively. Then, the spectra of the samples in each group are used to train a submodel, called PLS1-SM1, PLS1-SM2 and PLS1-SM3, respectively. After that, another submodel is trained based on all the 2040 spectra, called PLS1-SM4. When predicting the oxide concentration based on an unseen LIBS spectrum, the PLS1-SM4 model is first adopted to provide a preliminary concentration value. Then, the researchers should judge which group this preliminary concentration value belongs to (low, medium, or high) and select an appropriate submodel from PLS1-SM1, PLS1-SM2 and PLS1-SM3. Finally, the concentration value predicted by the corresponding appropriate submodel is regarded as the result of the PLS1-SM method. It is noteworthy that the PLS1-SM process needs to be carried out for each oxide independently.

(2) ICA method: to build and train the ICA model, the researchers also use the 408 samples and the 2040 spectra mentioned above. For each spectrum, the ICA algorithm can decompose it into K independent signal components. The researchers set K = 8 here because there are eight oxides to be analyzed. Represented by a vector, each decomposed independent component needs to be correlated to a certain oxide through manual judgement. Then, through certain matrix operation, each independent component can yield an ICA score, which is also a vector with the same dimension. Finally, each ICA score can be related to the concentration value of the corresponding oxide via a certain fitting pattern. After doing this for all the eight oxides, the ICA model training is completed. When predicting the oxide concentration based on an unseen LIBS spectrum, the researchers can calculate its eight ICA scores based on the trained ICA model and further calculate the corresponding concentration values of the eight oxides.

(3) Weighting method: the core is to calculate the weighted sum of the PLS1-SM result and the ICA result. For a certain oxide, the researchers need to manually set an appropriate proportion weight value, based on their preliminary estimation of the range in which the oxide concentration value falls. It is worth emphasizing that the manual setting of the weight value may require a lot of trial and error, and furthermore each oxide may have its own appropriate weight value.

Since the RMSE level of the transfer learning method and that of the MOC method are generally close, we would present the advantages of the transfer learning method over the MOC method, mainly from the perspective of time and labor cost.

Firstly, the MOC method requires the aforementioned data transfer as a preprocessing step, whose computational cost will increase as the number of samples and/or the number of LIBS spectra increases. Besides, this step involves considerable cost in the preparation of common samples. By contrast, the transfer learning method needs no data transfer, hence requiring less computational cost and less sample preparation work.

Secondly, the MOC method needs quite a lot manual interventions, e.g. manual judgement is required when correlating ICA components to certain oxides and fitting the ICA scores with concentration values; manual trial-and-error processes are required when setting the proportion weight values. Furthermore, when researchers build and train the PLS1 submodels and ICA models, they need to carry out the process for each oxide separately, and the model parameters can hardly be shared. Additionally, even when there are slight changes in the LIBS data and/or the specific target task, PLS and ICA models need to be rebuilt and retrained from scratch. This is understandable since these conventional models do not accumulate any reusable and shareable knowledge about “features”. In our transfer learning method, however, the CNN model can simultaneously analyze the concentration values of multiple oxides. Moreover, the CNN model construction is almost in a once-for-all pattern, since most hyperparameters of the model do not need to be modified when we update the spectral datasets or change the oxide to be analyzed. Even for the CNN model optimization (both the source domain model and target domain model), although a few hyperparameters need to be tuned by trial and error, this tuning process may not be so labor-intensive because there are many automatic optimization methods (e.g. Bayesian optimization, genetic algorithm, simulated annealing, etc.). Therefore, although a single PLS/ICA computation is faster and easier than a single CNN computation, the entire MOC method can be more time-consuming and labor-intensive than our CNN-based transfer learning method.

Based on the above analysis, it can be stated that although the MOC method is exquisite and the MOC result accuracy is admirable, the proposed transfer learning method is advantageous regarding the time and labor cost.

5.3 Future prospects

Herein, we would discuss some possible improvement measures for the Mars in situ spectral analysis.

5.3.1 Improving data/task similarity. Just based on intuition, one may tell that the higher the similarity between the source domain and target domain data/task is, the better the transfer learning effectiveness would be. This intuition has been supported by several papers. For example, it has been stated that the transferability of features decreases as the difference between the source domain task and target domain task increases;⁶⁸ it has been empirically demonstrated that if two tasks are too dissimilar, then brute-force transfer may hurt the performance of the target task.^69,70 Hence, in order to promote the transfer learning effect, we need to analyze the origins of the data/task discrepancy and improve the similarity.

Firstly, the detection distances in the two domains are not identical. As described in section 2.2, all the spectra in the source domain (Earth laboratory dataset) are acquired at a fixed distance of 1.6 m, while the spectra in the target domain (Mars in situ dataset) are acquired at several different distances (ChemCam calibration targets, 1.6 m; Martian natural targets varying from 2.4 to 2.7 m). It is well known that detection distance is one of the most important factors that can affect the LIBS spectral profile characteristics.⁷¹ Specifically, for the CNN method, intensity normalization based on the whole spectral intensity sum has been demonstrated to be a practical method to improve the performance in a classification task, since the intensity normalization can productively mitigate the distance effect.⁴⁸ However, the trial and error in the current study indicates that a mere intensity normalization may negatively influence the accuracy of a CNN in a quantification task. Hence, a potential way to further improve the accuracy of Mars in situ spectra is to design a more powerful distance effect correction strategy.

Secondly, the physical properties of the samples in the source domain and those in the target domain are different. For one thing, the samples in the laboratory are commonly pressed pellets, while the samples on the Mars are in a natural state (except the calibration targets). So, the physical parameters (e.g. particle size and compactness) of the samples in the two domains can be quite different, hence leading to the physical matrix effect. For another, the samples in the laboratory usually have rather clean surfaces, while the samples on the Mars are very likely to be covered by the Martian dust on the surfaces. As described in ref. 50, the ChemCam team routinely regards the LIBS spectra from the first 5 laser shots on each sampling position as dust-affected and directly abandon those spectra in the subsequent analysis. In the process of transfer learning, the physical matrix effect might be largely corrected through the knowledge transfer from the source domain to the target domain,⁶³ but the dust effect can hardly be corrected, since the chemical composition of the Martian dust could be drastically distinct from that of the target sample (e.g. rock and soil) and the dust-affected spectra are genuine outlier data instead of knowledge-contained data. Therefore, it would be valuable to develop a more sophisticated strategy which can exactly pick out all the dust-affected spectra for every single sample (rather than simply abandon the first 5 spectra). As long as the spectra in the source domain and/or the target domain are appropriately filtered (i.e. removing the dust-affected spectra), this transfer learning methodology is still expected to demonstrate excellent performance and play an important role in the analysis of Mars in situ LIBS data.

Thirdly, the LIBS detection scenario in the laboratory is not totally the same as that on the Mars. In the laboratory, although the target samples can be placed in a chamber that is able to well simulate the Martian atmospheric environment, the LIBS instrument (i.e. LANL ChemCam laboratory testbed) is in the normal atmospheric environment. In the Mars field detection, however, both the target samples and the LIBS instrument are in the Martian atmospheric environment. So, if the LIBS instrument can also be placed in a Mars-like environment when conducting the laboratory experiments,⁷² the similarity between the source domain data and target domain data can be further enhanced, and this would be helpful to further promote the transfer learning performance.

Note that we do not intend to claim one dissimilarity origin is more important than another, e.g. the distance-caused dissimilarity is more important than the environment-caused dissimilarity. It would be beneficial for improving the effectiveness and/or efficiency of transfer learning if we can mitigate any one type of dissimilarity and hence the overall dissimilarity.

5.3.2 Improving foundation algorithms and transfer strategies. Besides the aspect of data similarity, one may also consider the aspects of foundation algorithms and transfer learning strategies.

In this study, the foundation algorithm is CNN, and we can consider further ameliorating the source domain CNN model by adopting more sophisticated optimization techniques and/or employing more samples for pretraining the source domain model.

In fact, the number of training samples is not only important to the pretraining of the source domain model but also important to the retraining of the target domain model. Although transfer learning does not require too many samples for retraining due to the inherent nature of this methodology, it does not mean that even very few samples can ensure high-quality retraining. As can be seen, despite that the overall RMSE level of our Scheme TL shows superiority over the four alternative schemes and is close to that of the MOC results, the accuracy of Scheme TL is far from the ideal (regarding the relatively high average RE level and the few extremely high RE values). Besides the aforementioned reason that there are “very difficult” samples in our testing set, the number of samples for retraining the target domain model can also be an important reason.

In this work, there are merely 7 samples in all for the target domain model retraining, less than the retraining sample numbers in two other CNN-PMTL studies.^60,61 In order to improve the transfer learning effectiveness, one can try to use more samples for retraining. If the small sample size is hard to change, one may consider adding the number of spectra of each sample (e.g. in our scheme, there are 25 spectra for each sample and 175 spectra totally). We should also pay attention to overfitting prevention by utilizing proper regularization techniques in the CNN model and meanwhile retraining as few layers as possible.

It is worth pointing out that we can also try to employ better transfer learning patterns, e.g. feature-based transfer learning pattern, which may not require as many training samples as the current pretrained-model-based pattern.

Finally, it is well known that deep learning algorithms like the CNN have inherent defects such as the poor interpretability of the results,^73,74 since the features learned by the CNN model may be abstract. In fact, the knowledge transferred in the transfer learning algorithms can also be highly obscure. Nowadays, computer vision experts have proposed a series of visualization methods to observe the importance of different data parts in an image for image classification tasks, such as GradCam,⁷⁵ ScoreCam,⁷⁶etc. However, these methods are currently only applied to classification tasks, and it is still highly challenging to apply these methods to regression tasks like LIBS quantification. Therefore, enhancing the interpretability of the scheme proposed in this study, which involves both deep learning and transfer learning, is also a worthwhile research direction in the future.

6. Conclusion

In this work, we have adopted a deep learning CNN-based LIBS chemometrics for the concentration quantification of eight oxide components according to ChemCam Mars in situ spectra. Since the quantity of labeled samples that can be offered by the ChemCam payload is too limited to support effective CNN model training, we have proposed a scheme which integrates deep learning and transfer learning, with the transfer learning employing a pretrained-model-based pattern. For the transfer learning, the source domain dataset is composed of 59 [thin space (1/6-em)]

760 spectra collected from 332 samples in the laboratory, while the target domain dataset consists of two parts, i.e. 175 spectra from 7 ChemCam calibration target samples and 575 spectra from 3 Martian natural target samples.

When testing the spectra from the 7 ChemCam calibration targets based on the model retrained in a “leave-one-out” way, the transfer learning scheme can well outperform four alternative schemes that adopt mere deep learning, in terms of overall RMSE values, component-wise RMSE values, overall RE values and component-wise RE values. When testing the spectra from the Martian natural targets based on the model retrained by all the 7 ChemCam calibration targets, the APXS-measured results are regarded as the reference “real” values, and the transfer learning scheme prediction results are generally as accurate as the MOC results acquired via the exquisite PLS1-SM/ICA scheme, while the transfer learning scheme results show relatively smaller fluctuations.

As demonstrated herein, the deep learning CNN model in the transfer learning scheme can make full use of the data pattern knowledge contained in the abundant source domain samples, and it needs only a few new samples in the target domain and a small number of iteration steps to realize an efficient retraining and achieve good prediction performance in the target domain. Besides, unlike the conventional “data transfer” strategies, the transfer learning scheme focuses on “knowledge transfer” and hence requires no effort for data conversion. The model, instead of a human analyst, would be responsible for mining and transferring the latent knowledge, hence making the entire process more labor-saving. In short, the proposed scheme can fully exploit the advantages of deep learning while effectively addressing the labeled-sample scarcity issue.

As the first work that employs deep learning to analyze Mars in situ LIBS data, and meanwhile the first work that employs transfer learning ("knowledge transfer" instead of "data transfer") to analyze Mars in situ LIBS data, this study only takes the ChemCam dataset as the example, but we have reasons to believe that the proposed deep learning combining transfer learning strategy is a promising methodology to analyze the in situ spectra collected by other Mars LIBS payloads such as SuperCam and MarSCoDe, as well as the field detection data in future planetary exploration missions.

Data availability

We used publicly available data. Data for this article are available at [calib] and [data] at [URL – https://pds-geosciences.wustl.edu/msl/msl-m-chemcam-libs-4_5-rdr-v1/mslccm_1xxx/].

Author contributions

Zhicheng Cui: conceptualization, data curation, methodology, software, validation, formal analysis, investigation, visualization, writing – original draft. Luning Li: conceptualization, methodology, software, validation, formal analysis, writing – original draft, writing – review & editing, funding acquisition, project administration, supervision. Rong Shu: conceptualization, writing – review & editing, supervision. Fan Yang: formal analysis, validation. Yuwei Chen: formal analysis, writing – review & editing, funding acquisition. Xuesen Xu: data curation. Jianyu Wang: conceptualization, writing – review & editing. Agnès Cousin: writing – review & editing. Olivier Forni: writing – review & editing. Weiming Xu: formal analysis, writing – review & editing, project administration, supervision.

Conflicts of interest

There are no conflicts to declare.

Acknowledgements

This work was supported by the National Key R&D Program of China (Grant No. 2022YFF0504100), National Natural Science Foundation of China (Grant No. 62475273), Shanghai Rising-Star Program (Grant No. 23QA1411000), Natural Science Foundation of Shanghai (Grant No. 22ZR1472400), Innovation Project of Shanghai Institute of Technical Physics (Grant No. CX478), and Academy of Finland projects (Grant No. 336145 and 353363).

Notes and references

J. D. Winefordner, I. B. Gornushkin, T. Correll, E. Gibb, B. W. Smith and N. Omenetto, J. Anal. Atom. Spectrom., 2004, 19, 1061–1083 RSC.
D. S. Ferreira, F. M. V. Pereira, A. C. Olivieri and E. R. Pereira-Filho, Anal. Chim. Acta, 2024, 1303, 342522 CrossRef CAS PubMed.
G. Dilecce, O. De Pascale, A. Bove and G. S. Senesi, Opt. Laser Technol., 2020, 132, 106463 CrossRef CAS.
S. Santini, B. Campanella, S. Giannarelli, V. Palleschi, F. Poggialini and S. Legnaioli, Spectrochim. Acta, Part B, 2024, 216, 106948 CrossRef CAS.
S. Niu, L. J. Zheng, A. Q. Khan, G. Feng and H. P. Zeng, Talanta, 2018, 179, 312–317 CrossRef CAS PubMed.
M. N. Khan, Q. Q. Wang, B. S. Idrees, W. T. Xiangli, G. E. Teng, X. T. Cui and Z. F. Zhao, Front. Phys., 2022, 10, 821057 CrossRef.
A. Safi, J. E. Landis, H. G. Adler, H. Khadem, K. E. Eseller, Y. Markushin, S. Honarparvaran, A. D. Giacomo and N. Melikechi, Talanta, 2024, 271, 125723 CrossRef CAS PubMed.
T. F. Akhmetzhanov, T. A. Labutin, D. M. Korshunov, A. A. Samsonov and A. M. Popov, J. Anal. At. Spectrom., 2023, 38, 2134 RSC.
C. Fabre, Spectrochim. Acta, Part B, 2020, 166, 105799 CrossRef CAS.
R. C. Wiens, S. Maurice, B. Barraclough, M. Saccoccio, W. C. Barkley, J. F. Bell III and S. Bender, et al. , Space Sci. Rev., 2012, 170, 167–227 CrossRef.
S. Maurice, R. C. Wiens, M. Saccoccio, B. Barraclough, O. Gasnault, O. Forni and N. Mangold, et al. , Space Sci. Rev., 2012, 170, 95–166 CrossRef CAS.
S. Maurice, R. C. Wiens, P. Bernardi, P. Caïs, S. Robinson, T. Nelson and O. Gasnault, et al. , Space Sci. Rev., 2021, 217, 47 CrossRef.
R. C. Wiens, S. Maurice, S. H. Robinson, A. E. Nelson, P. Cais, P. Bernardi and R. T. Newell, et al. , Space Sci. Rev., 2021, 217, 4 CrossRef PubMed.
W. M. Xu, X. F. Liu, Z. X. Yan, L. N. Li, Z. Q. Zhang, Y. W. Kuang and H. Jiang, et al. , Space Sci. Rev., 2021, 217, 64 CrossRef.
X. F. Liu, W. M. Xu, H. Qi, X. Ren, J. J. Liu, L. N. Li and Z. X. Yan, et al. , Space Sci. Rev., 2023, 219, 43 CrossRef.
P.-Y. Meslin, O. Gasnault, O. Forni, S. Schröder, A. Cousin, G. Berger and S. M. Clegg, et al. , Science, 2013, 341, 1238670 CrossRef PubMed.
O. Forni, M. Gaft, M. J. Toplis, S. M. Clegg, S. Maurice, R. C. Wiens and N. Mangold, et al. , Geophys. Res. Lett., 2015, 42, 1020–1028 CrossRef CAS.
R. C. Wiens, A. Udry, O. Beyssac, C. Quantin-Nataf, N. Mangold, A. Cousin, L. Mando, T. Bosak and O. Forni, et al. , Sci. Adv., 2022, 8, eabo3399 CrossRef CAS PubMed.
A. Udry, A. Ostwald, V. Sautter, A. Cousin, O. Beyssac, O. Forni and G. Dromart, et al. , J. Geophys. Res. Planets, 2023, 128, e2022JE007440 CrossRef CAS.
O. Beyssac, O. Forni, A. Cousin, A. Udry, L. C. Kah, L. Mandon and O. E. Clavé, et al. , J. Geophys. Res. Planets, 2023, 128, e2022JE007638 CrossRef CAS.
J. J. Liu, X. G. Qin, X. Ren, X. Wang, Y. Sun, X. G. Zeng and H. B. Wu, et al. , Nature, 2023, 620, 303–309 CrossRef CAS PubMed.
X. G. Qin, X. Ren, X. Wang, J. J. Liu, H. B. Wu, X. G. Zeng and Y. Sun, et al. , Sci. Adv., 2023, 9, eadd8868 CrossRef CAS PubMed.
A. Wangeci, D. Adén, M. H. Greve and M. Knadel, Spectrochim. Acta, Part B, 2023, 206, 106712 CrossRef CAS.
D. Zhang, X. C. Niu, J. F. Nie, S. Q. Shi, H. H. Ma and L. B. Guo, Opt. Express, 2024, 32, 10851–10861 CrossRef CAS PubMed.
J. J. Hou, L. Zhang, Y. Zhao, Z. Wang, Y. Zhang, W. G. Ma and L. Dong, et al. , Plasma Sci. Technol., 2019, 21, 034016 CrossRef CAS.
J. Li, J. D. Lu, Z. X. Lin, S. S. Gong, C. L. Xie, L. Chang, L. F. Yang and P. Y. Li, Opt. Laser Technol., 2009, 41(8), 907–913 CrossRef CAS.
K. F. Zhang, W. R. Song, Z. Y. Hou and Z. Wang, Front. Phys., 2024, 19, 42203 CrossRef.
Y. T. Huang, J. J. Lin, X. M. Lin and W. N. Zheng, J. Anal. At. Spectrom., 2021, 36, 2553 RSC.
X. Cama-Moncunill, M. Markiewicz-Keszycka, P. J. Cullen, C. Sullivan and M. P. Casado-Gavalda, Food Chem., 2020, 309, 125754 CrossRef CAS PubMed.
R. B. Anderson, O. Forni, A. Cousin, R. C. Wiens, S. M. Clegg, J. Frydenvang and T. S. J. Gabriel, et al. , Spectrochim. Acta, Part B, 2022, 188, 106347 CrossRef CAS.
X. Zhang, N. Li, C. H. Yan, J. H. Zeng, T. L. Zhang and H. Li, J. Anal. At. Spectrom., 2020, 35, 403 RSC.
R. W. Liu, P. Chen, Z. Z. Wang, K. Rong, J. J. Yan, J. P. Liu and Y. Deguchi, Adv. Powder Technol., 2021, 32, 2978–2987 CrossRef CAS.
S. A. Davari and D. Mukherjee, Appl. Spectrosc., 2022, 76(6), 667–677 CrossRef CAS PubMed.
H. C. Bai, P. Liu, X. H. Fu, L. Qiao, C. Q. Liu, Y. Q. Xin and Z. C. Ling, Spectrochim. Acta, Part B, 2023, 199, 106587 CrossRef CAS.
J. He, C. Y. Pan, Y. B. Liu and X. W. Du, Appl. Spectrosc., 2019, 73(6), 678–686 CrossRef CAS PubMed.
B. Y. Zhang, C. Sun, X. W. Yu, F. Y. Chen, L. Wang, Y. F. Rao, T. Y. Sun, Y. Y. S. Zhao and J. Yu, Spectrochim. Acta, Part B, 2023, 206, 106708 CrossRef CAS.
C. X. Lu, B. Wang, X. P. Jiang, J. N. Zhang, K. Niu and Y. W. Yuan, Plasma Sci. Technol., 2019, 21, 034014 CrossRef CAS.
S. Chen, H. L. Pei, J. Pisonero, S. X. Yang, Q. W. Fan, X. Wang and Y. X. Duan, J. Anal. At. Spectrom., 2022, 37, 508–516 RSC.
J. Castorena, D. Oyen, A. Ollila, C. Legett and N. Lanza, Spectrochim. Acta, Part B, 2021, 178, 106125 CrossRef CAS.
E. Képeš, J. Vrábel, T. Brázdil, P. Holub, P. Pořízka and J. Kaiser, Talanta, 2024, 266, 124946 CrossRef PubMed.
P. Khalilian, F. Rezaei, N. Darkhal, P. Karimi, A. Safi, V. Palleschi, N. Melikechi and S. H. Tavassoli, Sci. Rep., 2024, 14, 5169 CrossRef CAS PubMed.
G. H. Chen, Q. D. Zeng, W. X. Li, X. G. Chen, M. T. Yuan, L. Liu, H. H. Ma, B. Y. Wang, Y. Liu, L. B. Guo and H. Q. Yu, Opt. Express, 2022, 30, 9428–9440 CrossRef CAS PubMed.
F. Rezaei, P. Khalilian, M. Rezaei, P. Karimi and B. Ashrafkhani, Optik, 2024, 309, 171838 CrossRef CAS.
C. Fabre, S. Maurice, A. Cousin, R. C. Wiens, O. Forni, V. Sautter and D. Guillaume, Spectrochim. Acta, Part B, 2011, 66, 280–289 CrossRef CAS.
D. Vaniman, M. D. Dyar, R. C. Wiens, A. Ollila, N. Lanza, J. Lasue, J. M. Rhodes, S. Clegg and H. Newsom, Space Sci. Rev., 2012, 170, 229–255 CrossRef CAS.
L. N. Li, X. F. Liu, W. M. Xu, J. Y. Wang and R. Shu, Spectrochim. Acta, Part B, 2020, 169, 105850 CrossRef CAS.
X. Q. Cao, L. Zhang, Z. C. Wu, Z. C. Ling, J. L. Li and K. C. Guo, Plasma Sci. Technol., 2020, 22, 115502 CrossRef CAS.
F. Yang, L. N. Li, W. M. Xu, X. F. Liu, Z. C. Cui, L. C. Jia and Y. Liu, et al. , Spectrochim. Acta, Part B, 2022, 192, 106417 CrossRef CAS.
F. Yang, W. M. Xu, Z. C. Cui, X. F. Liu, X. S. Xu, L. C. Jia, Y. W. Chen, R. Shu and L. N. Li, Remote Sens., 2022, 14, 5343 CrossRef.
S. M. Clegg, R. C. Wiens, R. Anderson, O. Forni, J. Frydenvang, J. Lasue and A. Cousin, et al. , Spectrochim. Acta, Part B, 2017, 129, 64–85 CrossRef CAS.
K. H. Lepore, M. D. Dyar and C. R. Ytsma, Spectrochim. Acta, Part B, 2024, 211, 106839 CrossRef CAS.
K. H. Lepore, I. Belkhodja, M. D. Dyar and C. R. Ytsma, Spectrochim. Acta, Part B, 2024, 217, 106970 CrossRef CAS.
E. Képeš, J. Vrábel, P. Pořízka and J. Kaiser, J. Anal. At. Spectrom., 2022, 37, 1883–1893 RSC.
J. Vrábel, E. Képeš, P. Nedělník, J. Buday, J. Cempírek, P. Pořízka and J. Kaiser, J. Anal. At. Spectrom., 2023, 38, 841–853 RSC.
T. Zhou, L. Zhang, Z. Ling, Z. Wu and Z. Shen, J. Appl. Spectrosc., 2022, 89, 1002–1013 CrossRef CAS.
C. Q. Tan, F. C. Sun, T. Kong, W. C. Zhang, C. Yang and C. F. Liu, International Conference on Artificial Neural Networks, Springer, 2018, pp.270–279 Search PubMed.
J. H. Yang, X. M. Li, H. L. Lu, J. W. Xu and H. X. Li, J. Anal. At. Spectrom., 2018, 33, 1184–1195 RSC.
S. Shabbir, W. J. Xu, Y. Q. Zhang, C. Sun, Z. Q. Yue, L. Zou, F. Y. Chen and J. Yu, Spectrochim. Acta, Part B, 2022, 194, 106478 CrossRef CAS.
T. Chen, L. X. Sun, H. B. Yu, W. Wang, L. F. Qi, P. Zhang and P. Zeng, Appl. Geochem., 2022, 136, 105135 CrossRef CAS.
J. C. Cui, W. R. Song, Z. Y. Hou, W. L. Gu and Z. Wang, J. Anal. At. Spectrom., 2022, 37, 2059–2068 RSC.
J. Chen, W. H. Yan, L. Z. Kang, B. Lu, K. Liu and X. Y. Li, Anal. Methods, 2023, 15, 5157 RSC.
P. Lin, X. L. Wen, S. X. Ma, X. C. Liu, R. H. Xiao, Y. F. Gu, G. H. Chen, Y. X. Han and D. M. Dong, Spectrochim. Acta, Part B, 2023, 206, 106729 CrossRef CAS.
C. Sun, W. J. Xu, Y. Q. Tan, Y. Q. Zhang, Z. Q. Yue, L. Zou, S. Shabbir, M. T. Wu, F. Y. Chen and J. Yu, Sci. Rep., 2021, 11, 21379 CrossRef CAS PubMed.
R. Gellert, J. L. Campbell, P. L. King, L. A. Leshin, G. W. Lugmair, J. G. Spray, S. W. Squyres and A. S. Yen, in 40th Lunar Planet. Sci. Conf., 2009, vol. 2364 Search PubMed.
R. C. Wiens, S. Maurice, J. Lasue, O. Forni, R. B. Anderson, S. Clegg and S. Bender, et al. , Spectrochim. Acta, Part B, 2013, 82, 1–27 CrossRef CAS.
C. Fabre, A. Cousin, R. C. Wiens, A. Ollila, O. Gasnault, S. Maurice, V. Sautter, O. Forni, J. Lasue, R. Tokar, D. Vaniman and N. Melikechi, Spectrochim. Acta, Part B, 2014, 99, 34–51 CrossRef CAS.
M. D. Zeiler and R. Fergus, Comput. Vis. ECCV, 2014, 8689, 818–833 Search PubMed.
J. Yosinski, J. Clune, Y. Bengio and H. Lipson, Adv. Neural Inf. Process. Syst., 2014, 4, 3320–3328 Search PubMed.
S. J. Pan and Q. Yang, IEEE Trans. Knowl. Data Eng., 2010, 10, 1345–1359 Search PubMed.
M. T. Rosenstein, Z. Marx and L. P. Kaelbling, Proc. Conf. Neural Information Processing Systems, 2005, vol. 12 Search PubMed.
R. C. Wiens, A. J. Blazon-Brown, N. Melikechi, J. Frydenvang, E. Dehouck, S. M. Clegg, D. Delapp, R. B. Anderson, A. Cousin and S. Maurice, Spectrochim. Acta, Part B, 2021, 182, 106247 CrossRef CAS.
Z. C. Cui, L. C. Jia, L. N. Li, X. F. Liu, W. M. Xu, R. Shu and X. S. Xu, Remote Sens., 2022, 14, 1954 CrossRef.
A. B. Arrieta, N. Díaz-Rodríguez, J. D. Ser, A. Bennetot, S. Tabik, A. Barbado and S. Garcia, et al. , Inf. Fusion, 2020, 58, 82–115 CrossRef.
L. N. Li, X. F. Liu, F. Yang, W. M. Xu, J. Y. Wang and R. Shu, Spectrochim. Acta, Part B, 2021, 180, 106183 CrossRef CAS.
R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh and D. Batra, IEEE International Conference on Computer Vision (ICCV), 2017, pp. 618–626 Search PubMed.
H. F. Wang, Z. F. Wang, M. N. Du, F. Yang, Z. J. Zhang, S. R. Ding, P. Mardziel and X. Hu, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2020, pp. 24–25 Search PubMed.

Click here to see how this site uses Cookies. View our privacy policy here.