Open Access Article
André L. M. de Souza
a,
Maite Aramendía
*a,
Esperanza García-Ruiz
a,
Flávio V. Nakadi
a,
Javier Resanob and
Martín Resano
*a
aDepartment of Analytical Chemistry, Aragon Institute of Engineering Research (I3A), University of Zaragoza, Pedro Cerbuna 12, 50009 Zaragoza, Spain. E-mail: mresano@unizar.es; maiteam@unizar.es
bUniversity of Zaragoza, Department of Computer Science and Systems Engineering, Aragon Institute of Engineering Research (I3A), Zaragoza, 50018, Spain
First published on 23rd April 2026
This work evaluates different strategies for data processing, aiming at achieving isotopic information via high-resolution continuum-source graphite-furnace molecular absorption. For this purpose, two different molecules are investigated: CaF and CaCl. In the first case, only the measurement of 44Ca and 40Ca is pursued, whereas in the second case, isotopic variations affect both elements present in the molecule (44Ca and 40Ca, but also 37Cl and 35Cl). Thus, two different approaches are proposed. For Ca isotopic analysis through the monitoring of CaF, the effects of selecting the number of detection pixels and the number of molecular spectra, as well as of using a regression approach for temporal data, are discussed. Overall, using three detector pixels and using this regression approach tend to produce the best results (0.5–1.0% RSD) for isotopic analysis via HR CS GFMAS in those situations in which the signal can be derived from two separate peaks. On the other hand, to perform simultaneous Ca and Cl isotopic analysis by monitoring CaCl, a machine-learning strategy is proposed. The performance of such a model is promising for isotopic abundances of at least 10% (median absolute percentage error of 1.21%), while the error escalates when one of the isotopes shows a lower abundance. To detect such underperforming situations in real-world settings, it is recommended to monitor the prediction uncertainty to set thresholds and flag results with poor reliability.
Monitoring molecular spectra with sufficient resolution also offers new opportunities to access isotopic information, because the isotopic shifts in molecular spectra are significantly larger than those in their atomic counterparts. This approach was first explored for LIBS, giving rise to laser ablation molecular isotopic spectrometry (LAMIS),9–11 and was later also evaluated for HR CS GFMAS, which, among other aspects, may offer superior sensitivity, although with much more limited multi-element potential.
Despite the obvious interest in getting access to a new type of information (isotopic) for techniques traditionally restricted to elemental analysis, except for the very few elements (e.g., B or Li) for which the isotopic atomic transitions are sufficiently resolved,12,13 only a few papers to date have been published making use of isotopic analysis via HR CS GFMAS.
Our research group reported on the first work on this topic in 2015, demonstrating the separate monitoring of Al35Cl and Al37Cl transitions,14 thus enabling Cl isotopic monitoring. Two distinct peaks for these species were observed under optimized conditions, and precision values of around 2% RSD for Cl content at the mg L−1 level were reported. Furthermore, the potential of using isotope dilution for mitigating chemical interferences when determining Cl in mineral water was demonstrated.
In a later work, a similar strategy was applied to obtain Br isotopic information. After evaluating various molecule-forming agents, Ca was selected, enabling the selective monitoring of Ca79Br and Ca81Br transitions.15 The precision obtained was around 2.5% RSD for 10 mg L−1 Br aqueous solutions. It was shown that the method can also be applied directly to solid samples, where the use of isotope dilution permitted the direct determination of Br in PVC and tomato leaves reference materials, despite the chemical interferences detected.
Isotopic analysis of B, based on measurements of the 10BH and 11BH transitions, was then demonstrated by Abad et al. by monitoring transitions around wavelengths 433.1 nm and 437.1 nm.16 Excellent precision was achieved (with expanded uncertainties in the 0.015–0.044% range) by building a spectral library with different isotope ratios for partial least squares regression (PLSR), but only for concentrations of at least 1 g L−1. Still, this indicates that the use of more advanced chemometric tools can help in improving precision for HR CS GFMAS isotopic analysis, in the same way as is common for LAMIS,8,10,11 where, for instance, the use of PLSR is very usual. It makes sense to assume that such tools are particularly important for maximizing the information obtained from complex, rich spectra. Even more so since other common strategies applied in LAMIS to improve precision, such as the accumulation of spectra, can hardly be applied to HR CS GFMAS.
The selective monitoring of 40CaF and 44CaF was reported by Zanatta et al.17 This work, rather than measuring isotopic ratios, focused on determining 40Ca and 44Ca isotopes in urine samples, after Ca separation by precipitation with ammonium oxalate to remove Cl interference. It can be mentioned that the A2Π–X2Σ+ band of CaF isotopologues has been investigated in detail recently by means of laser-induced fluorescence spectroscopy.18
Bazo et al. expanded the field by reporting data on the four stable isotopes of Sr (by measuring 88SrF, 87SrF, 86SrF, and 84SrF transitions), rather than only two as in previous works.19 To overcome spectral overlap, a deconvolution approach was used. The reported precision for isotope ratios was relatively high (6–11% RSD), but the approach demonstrated potential for monitoring 84Sr as a tracer in tap water.
Aramendía et al. explored the formation of BF with a double purpose: (i) to enable the determination of B using a milder temperature program, thus improving the potential for measuring this refractory element in a graphite furnace; (ii) to attain isotopic information. In this regard, it was possible to calculate 11B/10B ratios by separately monitoring the 11BF and 10BF molecular absorption transitions, with RSD values improving for higher B masses, but only down to 3–4% under the best conditions.20
Finally, Abad et al. monitored 14NO and 15NO transitions and developed an isotope-dilution-based method for determining total nitrate + nitrite in natural waters.21 The transitions of the isotopologues overlap partially, so non-linear multivariate analysis (PLSR) was required for spectral deconvolution. The expanded uncertainty reached 2–4% RSD.
Therefore, the number of articles published on this topic over the last ten years remains rather limited. This could be, among other reasons, because the optical community is not accustomed to the concept of isotopic analysis, its potential, and the methods for maximizing the information it can yield. But perhaps most importantly, there remains a lack of understanding of fundamental aspects, including the primary sources of uncertainty.
In this regard, it may be difficult to disentangle some of these potential sources of uncertainty (e.g., possible temporal variations superimposed on noise). Nevertheless, from a practical point of view, it is at least possible to investigate the optimal data processing approaches required to improve data quality. Furthermore, as discussed above, the RSD values reported via HR CS GFMAS in the vast majority of cases are of a few %. This may suffice for some applications (e.g., isotope dilution or tracer experiments), but not for others (e.g., monitoring natural variations), and leaves room for improvement.
This work further explores new approaches to data treatment in order to obtain results of the highest analytical quality. For this purpose, two different molecules are investigated via HR CS GFMAS, namely CaF and CaCl. Such molecules are selected because they correspond to two very different situations.
CaF offers a relatively simple isotopic system, in which the only possible isotopic variations stem from one of the elements composing the molecule (Ca). The best strategy for the selection of the most representative portion of the 3D signals generated via HR CS GFMAS will be investigated. Special attention will be paid to the translation of the regression approach, originally developed for the treatment of transient isotopic signals with multicollector inductively coupled plasma mass spectrometry (MC-ICP-MS),22,23 to HR CS GFMAS isotopic analysis. It should be noted that, since the goal differs from that explored by Zanatta et al.17, which was a tracer experiment, different transitions will be monitored.
CaCl, on the other hand, has been selected to represent a more complex situation in which both elements show a potential isotopic variation. The resulting spectra are significantly more complex in terms of lines, and a novel approach based on machine learning is tested to predict the isotopic compositions of both Ca and Cl simultaneously.
Finally, we consider it necessary to clarify that this is a proof-of-concept study focused on achieving the best analytical performance from model solutions, unlike previous works where analysis of real samples was demonstrated, as discussed before.
For CaF molecule studies, calcium standards with different isotopic compositions were used. A 1000 mg L−1 Ca standard solution (Merck, Germany) was used for the natural stock (40Ca abundance of 96.9%). A 1000 mg L−1 44Ca-enriched solution (certified 44Ca atomic abundance of 99.2%) was prepared from the CaCO3 salt (Neonest AB, Sweden) dissolved in a 1% HNO3 (v v−1) solution. As the fluorinating agent, a NaF (Merck, Germany) 5% (m v−1) solution was prepared.
For CaCl studies, in addition to the Ca standards previously described, a 1000 mg L−1 Cl standard solution (Merck, Germany) was used as a reference for natural abundance (35Cl, 75.8% and 37Cl, 24.2%, respectively). A 35Cl-enriched NaCl salt (CortecNet, France) with certified abundance of 99.1% was dissolved in a HNO3 1% v v−1 solution for a final Cl concentration of approximately 200 mg L−1. Finally, a 164.6 mg L−1 37Cl-enriched standard (ERM, Belgium) with a certified 37Cl atomic abundance of 98.1% was also measured.
![]() | ||
| Fig. 1 Wavelength- and time-resolved signal obtained by HR CS GFMAS using the conditions shown in Table 1 when monitoring a solution containing approximately 5 µg of 40Ca and 5 µg of 44Ca in the presence of an excess of F (0.5 mg of NaF). 2D projections on the Z-axis are also displayed to show the wavelength-integrated and time-integrated signals. The absorbance was normalized to a maximum value of 2 to keep all the values on the same scale. | ||
For each measurement, 10 µL of a Ca solution of different concentrations that varied from 200 to 1000 mg L−1 (2 to 10 µg of total calcium, respectively) was deposited into the furnace, together with 10 µL of NaF 5% (m v−1) solution as a fluorinating agent. The calcium isotopic compositions also varied: 44Ca/40Ca ratios of 4
:
1, 1
:
1, and 1
:
4 were measured. The measurements were done using the liquid autosampler, and no chemical modifier was used. The furnace program used was adapted from prior work17 and is presented in Table 1. For each calcium concentration and isotopic composition, 15 measurements were carried out. Additionally, 10 measurements of a blank solution containing only the fluorinating agent were performed. The final spectrum for each measurement was corrected by the average of the 10 blank spectra.
| Electronic transition | B2Σ–X2Σ |
|---|---|
| Wavelengths | 515.350 nm (central pixel 101) |
| 515.190 nm (40CaF) | |
| 515.408 nm (44CaF) | |
| Fluorinating agent | 10 µL of NaF 5% m v−1 |
| Aqueous sample volume | 10 µL |
| Ca mass introduced | 2–10 µg |
| 44Ca/40Ca isotopic ratio | 4 : 1, 1 : 1 and 1 : 4 |
| Measurement time | 6 s |
| Temperature program | ||||
|---|---|---|---|---|
| Step | Temperature (°C) | Ramp (°C s−1) | Hold (s) | Ar gas flow (L min−1) |
| Drying | 90 | 3 | 20 | 2.0 |
| Drying | 110 | 5 | 20 | 2.0 |
| Pyrolysis | 800 | 300 | 10 | 2.0 |
| Gas adaption | 800 | 0 | 5 | 0 |
| Vaporization | 2200 | 3000 | 6 | 0 |
| Cleaning | 2400 | 1500 | 4 | 2.0 |
To observe the spectral variations when both Ca and Cl are mixed, the 44Ca/40Ca and 37Cl/35Cl isotopic composition were both varied from 0
:
1 to 1
:
0, in increments of 10%, resulting in 121 different combinations. The solution for each combination was analyzed 5 times, and a 1% HNO3 (v v−1) blank solution was measured 10 times. The IBC-m background correction method was also used for this molecule. These measurements were used to build the machine learning model discussed in Section 3.3 and were obtained in four different full-day sessions. The experiments for additional validation with unknowns were conducted on a separate day, four weeks after the others.
A 2D (wavelength vs. absorbance) spectrum for each measurement was produced, selecting the temporal spectrum that provides the highest absorbance value (or else, the highest spectrum plus the ones measured before and after; or the highest spectrum plus the two measured before and the two after), always corrected by the blank spectrum. Finally, all spectra were normalized to their maximum absorbance to reduce variability across different measurement days.
Table 2 displays the instrumental measurement conditions, which were adapted from a previous work.25
| Electronic transition | A2Π–X2Σ |
|---|---|
| Wavelength | 604.970 nm (central pixel 101) |
| Standard amounts | Ca: 10 µL, 180 mg L−1 |
| Cl: 10 µL, 180 mg L−1 | |
| Isotopic compositions monitored | 44Ca/40Ca: 0 : 1.0, 0.1 : 0.9, 0.2 : 0.8, 0.3 : 0.7, 0.4 : 0.6, 0.5 : 0.5, 0.6 : 0.4, 0.7 : 0.3, 0.8 : 0.2, 0.9 : 0.1, 1.0 : 0 |
37Cl/35Cl: 0 : 1.0, 0.1 : 0.9, 0.2 : 0.8, 0.3 : 0.7, 0.4 : 0.6, 0.5 : 0.5, 0.6 : 0.4, 0.7 : 0.3, 0.8 : 0.2, 0.9 : 0.1, 1.0 : 0 |
|
| Measurement time | 5 s |
| Temperature program | ||||
|---|---|---|---|---|
| Step | Temperature (°C) | Ramp (°C s−1) | Hold (s) | Ar gas flow (L min−1) |
| Drying | 90 | 30 | 20 | 2.0 |
| Drying | 110 | 30 | 20 | 2.0 |
| Pyrolysis | 700 | 50 | 5 | 2.0 |
| Gas adaption | 700 | 0 | 5 | 0 |
| Vaporization | 2200 | 3000 | 5 | 0 |
| Cleaning | 2600 | 100 | 4 | 2.0 |
:
1, 1
:
1, and 1
:
4 were measured using an ICP-MS/MS to assess the real 44Ca/40Ca ratio. Solutions were diluted to a final concentration of 200 µg L−1 and analyzed under the following parameters: 16 L min−1 plasma Ar gas flow, 1.1 L min−1 nebulizer Ar gas flow, 1.01 L min−1 auxiliary Ar gas flow, 1.2 mL min−1 reaction gas flow (NH3), and 1600 W RF power. The nuclides monitored were 40Ca+ and 44Ca+, with both quadrupoles (Q1 and Q3) set to transmit m/z values of 39.9626 and 43.9555, respectively. The dwell time per isotope was 50 ms. Instrumental mass bias was corrected using the natural Ca standard as a reference. Also, a 1% HNO3 (v v−1) blank solution was measured.
5 replicates were carried out for each isotopic composition, and the uncertainties obtained (expressed as RSD) were: 44Ca/40Ca 4
:
1 (0.58%), 44Ca/40Ca 1
:
1 (0.29%), and 44Ca/40Ca 1
:
4 (0.57%).
Under these conditions, there are different CaF transitions that can be monitored and that respond selectively to different Ca isotopes, because the presence of F, being monoisotopic, does not affect the observed shift. The transition (B2Σ–X2Σ; ν′ = 2, ν″ = 1), which shows its main lines around 515.3 nm,24 was selected for the study because it offers a clean spectrum with low noise. For simplicity, only the two main calcium isotopes 40Ca and 44Ca (for which a spike is used) are considered, since the abundance of the other Ca isotopes will be too low to show a significant signal for the solutions measured.
Fig. 1 shows an example of the type of signal that is obtained under these conditions. As is typical of HR CS GFMAS, it is a 3D signal, where each one of the detector pixels (X-axis) monitors a particular wavelength (with a resolution of roughly 3.25 pm in this spectral range). The signal is obviously transient in nature (Y-axis), and the absorbance (Z-axis) of the whole spectrum is recorded approximately every 0.073 s. 2D signals (normalized absorbance vs. wavelength; normalized absorbance vs. time) are also projected onto the sides of the figure, allowing for a better appreciation of the effects of time and wavelength on the signal.
As shown in this figure, there are actually two doublets that respond selectively to 40CaF and 44CaF. Their sensitivities are very similar, so the results will focus on the first one from now on. The shift between these 40CaF and 44CaF signals is rather large (218 pm, corresponding to 67 detector pixels). The shift predicted for this transition, using the theoretical equations discussed in detail elsewhere,14,26 is 203.8 pm.
Unlike the situation for elemental analysis, there is no clear protocol to obtain isotopic information of the best quality from these signals. However, as is known from other high-precision isotopic techniques that use transient signals (e.g., when a laser ablation (LA) device is coupled to MC-ICP-MS), selecting the entire signal can degrade precision.
This situation can also be inferred for HR CS GFMAS signals, as shown in Fig. 2 and 3. Fig. 2 shows the effect of the number of detector pixels selected (thus, the wavelength range covered). As shown in the figure, for both 40CaF (Fig. 2a) and 44CaF (Fig. 2b), signal imprecision (evaluated as the RSD of 15 measurements) increases sharply for pixels with lower absorption, as expected and previously described.27 This has a direct consequence for the measurement of the 44Ca/40Ca ratio (see Fig. 2c), as the best precision (together with a ratio close to the expected value for this experiment, which is 1) is obtained only by ratioing the four central pixels. Obviously, adding more pixels increases noise due to the lower absorbance signal, which detracts from data quality. Thus, in this regard, the situation is similar to when trace elemental analysis is targeted via HR CS GFMAS/AAS, and the optimal number of pixels falls within the typically recommended 3–5 value.27,28
![]() | ||
| Fig. 2 Time-integrated signals obtained by HR CS GFMAS using the conditions shown in Table 1 when monitoring a solution containing approximately 5 µg of 40Ca and 5 µg of 44Ca in the presence of an excess of F (0.5 mg of NaF). (A) Absorbance signal of the first 40CaF peak (around 515.2 nm) and RSD values obtained at each detector pixel (n = 15); (B) absorbance signal of the first 44CaF peak (around 515.4 nm) and RSD values obtained at each detector pixel (n = 15); (C) signal ratios using each pair of detector pixels from A and B, with the corresponding RSD values. Error bars represent the standard deviation. | ||
![]() | ||
| Fig. 3 Wavelength-integrated signals (1 detector pixel) obtained by HR CS GFMAS using the conditions shown in Table 1 when monitoring a solution containing approximately 5 µg of 40Ca and 5 µg of 44Ca in the presence of an excess of F (0.5 mg of NaF). (A) Absorbance temporal profile of the first 40CaF peak (around 515.2 nm) and RSD values for each spectrum collected (n = 15); (B) absorbance temporal profile of the first 44CaF peak (around 515.4 nm) and RSD values for each spectrum collected (n = 15); (C) signal ratios using each spectrum from A and B, with the corresponding RSD values. Error bars represent the standard deviation. | ||
However, the influence of the other dimension (time) has not been evaluated as systematically in HR CS GFMAS/AAS measurements. Not for elemental analysis, where the whole signal is usually time-integrated, and even less so for isotopic analysis. Still, the influence on the quality of the results is very similar to that shown for the detector pixels. As shown in Fig. 3, the signals for both 40CaF (Fig. 3a) and 44CaF (Fig. 3b) exhibit significantly higher RSD values when considering the start or tail of the signals. And this imprecision ultimately affects the measurement of the 44Ca/40Ca ratio (see Fig. 3c). Achieving good precision is only possible when just the central part of the signal is considered, due to a better signal-to-noise ratio.
Now, defining the number of spectra to consider is not so straightforward as for the detector pixels. It is much more difficult to generalize when discussing the transient nature of the HR CS GFMAS signals because the contribution of the tailing to the total signal varies for different molecules. Very volatile ones will hardly exhibit any tailing, whereas refractory ones will do. It is therefore not simple to establish a rule, and it will a priori be desirable to find a way to use all of the spectra, to avoid subjectivity, while at the same time avoiding the extra uncertainty induced by those signal portions of lower absorbance.
This situation has already been explored for MC-ICP-MS when using sample introduction systems that produce transient signals, as discussed in the introduction. The elegant solution proposed was to leverage the properties of linear regression.22,23 If all the spectral points are used to calculate the ratios and then plotted for a least-squares linear regression, the slope of the plot will correspond to the estimated ratio. However, in such a regression, not all points contribute equally to the resulting equation, as it is well known that higher values carry greater weight. This approach may, therefore, also be recommendable in HR CS GFMAS.
Fig. 4 illustrates how this method works in practice and demonstrates that a high-quality regression model is obtained for the 40CaF and 44CaF signals, thereby offering a new approach to data processing.
![]() | ||
| Fig. 4 (A) Wavelength-integrated signals (1 detector pixel) obtained by HR CS GFMAS using the conditions shown in Table 1 when monitoring a solution containing approximately 5 µg of 40Ca and 5 µg of 44Ca in the presence of an excess of F (0.5 mg of NaF), showing the absorbance temporal profile of the first 40CaF peak (around 515.2 nm) and the first 44CaF peak (around 515.4 nm). The central 10 peak points are highlighted. (B) Example of the regression approach using all the points acquired in A, whereas the 44Ca/40Ca ratio corresponds to the slope estimated for the calibration curve. | ||
![]() | ||
| Fig. 5 Precision, expressed as RSD%, obtained by HR GFMAS for CaCO3 solutions prepared with different 44Ca/40Ca ratios, with different overall Ca contents, and processing the data using either 1, 3, or 5 detector pixels and a different procedure to evaluate the peak temporal profile (either the regression approach, as shown in Fig. 4 and indicated as slope, or a number of peak pixels of 1, 3, 5 or 10). (A) 44Ca/40Ca ratio of approx. 1; (B) 44Ca/40Ca ratio of approx. 0.25; (C) 44Ca/40Ca ratio of approx. 4. | ||
As can be seen in that figure (Fig. 5a), when the ratio is 1
:
1, the best precision is attained, as expected. For such a ratio, precision values of 0.5% to 1.0% RSD are typically achieved in most cases, regardless of concentration. The approach selected for data processing in this most favorable case seems less relevant. Still, some of the 2-factor (number of detector pixels, mode of processing the transient signal) ANOVAs carried out indicated differences that were statistically significant (mode of processing the transient signal: for 400 mg L−1 and 600 mg L−1; number of detector pixels: for 600, 800, and 1000 mg L−1). The best precision values are obtained overall with 3 detector pixels (average ± standard deviation, 0.59 ± 0.06%) and with the regression approach (0.60 ± 0.57%). However, regarding the way of processing the transient signal, the results are only clearly worse when the maximum value is used (0.82 ± 0.78%).
When the ratio is not so favorable, precision degrades, as could be anticipated. In general, the measurement of an isotope found at a lower level can be potentially more affected by noise variations, particularly since the definition of the baseline would be primarily influenced by the highest signals. In these situations (see Fig. 5b and c), the role of concentration becomes very clear, and precision degrades significantly at Ca contents of 200 and 400 mg L−1, hinting at low signal-to-noise ratios for the less abundant isotope. The potential effect of non-linearity for high signals does not seem to affect precision. However, this effect may affect the accuracy of the ratio, as will be discussed later.
In particular, for the 44CaF/40CaF ratio of approx. 0.25, ANOVAs found significant differences related to the number of detector pixels in all cases, and also regarding the mode of processing the signals for 400 and 1000 mg L−1. In practice, the difference between selecting 3 or 5 detector pixels per peak is minimal (1.09 ± 0.55% for 3 pixels; 1.13 ± 0.60% for 5 pixels), but again, using only one pixel (2.24 ± 0.66%) is not recommended. The difference between using the regression approach or 3 or 5 peak points is practically negligible (1.42 ± 1.24% for regression; 1.43 ± 1.27% for 3 points; 1.40 ± 1.25% for 5 points), but both using the maximum peak point (1.56 ± 1.37%) or the 10 peak points that characterize the whole signal peak (1.63 ± 1.40%) potentially increase the uncertainty.
On the other hand, for the 44CaF/40CaF ratio close to 4, the trend is even clearer. ANOVAs found statistical differences in all cases except for one (the way of processing the signals for 600 mg L−1). Best precision is obtained with either 1 (1.04 ± 0.49%) or 3 detector pixels (1.12 ± 0.40%), and it degrades for 5 (1.45% ± 0.76%). Concerning signal processing, the regression approach performs best in the vast majority of cases (1.02 ± 0.90%), and precision degrades as the number of peak points increases, reaching its worst value when 10 peak points are considered (1.50 ± 1.19%).
Overall, based on these precision values, the best conditions (or very close to them) are always achieved with 3 detector pixels and the regression approach, providing values around 1% RSD or better, and around 0.5% in the most favorable situations (ratio close to 1). While these values may still not be sufficiently good to monitor Ca natural variations, they represent an improvement over previous results reported for isotopic analysis via HR CS GFMAS (see introduction), which were most often around 2–3% RSD, or even higher.
Finally, while the primary goal of this work is to establish the optimal way to process the data to reduce uncertainty, it is obviously important to provide ratios as accurately as possible. To further evaluate this, the solutions prepared as described in Section 2 were analyzed by ICP-MS to determine their actual 44Ca/40Ca ratio, which served as the reference. The ratio of the calculated to the reference value, for the same dataset used in Fig. 5, is shown in Fig. 6.
![]() | ||
| Fig. 6 Experimental ratio 44Ca/40Ca obtained by HR CS GFMAS divided by the reference 44Ca/40Ca ratio (measured by ICP-MS) for solutions prepared with different 44Ca/40Ca ratios, with different overall Ca contents, and processing the data using either 1, 3, or 5 detector pixels and a different procedure to evaluate the peak temporal profile (either the regression approach, as shown in Fig. 4, or a number of peak pixels of 1, 3, 5 or 10). (A) 44Ca/40Ca ratio of approx. 1; (B) 44Ca/40Ca ratio of approx. 0.25; (C) 44Ca/40Ca ratio of approx. 4. | ||
Again, it is evident that the total concentration significantly affects the result. Nevertheless, it is possible to achieve a value most often within 2% of the expected one for a 44Ca/40Ca ratio of approximately 1 or 0.25, and that is without any correction approaches, as required for other techniques. For instance, when using ICP-MS, the instrumental mass bias for 44Ca/40Ca is very high (−23% in our working conditions) and was corrected for using a standard for which the natural abundance was assumed.
On the other hand, the results for a 44Ca/40Ca ratio of approximately 4 are less accurate and show a decreasing trend with increasing concentration, approaching the correct value at higher concentrations. This trend (lower ratio with higher concentration) is the inverse of that observed for a 44Ca/40Ca ratio of 0.25 and can be at least partially explained by the restricted linear range, which is always a factor to consider when estimating ratios via absorption techniques. Again, for 44Ca/40Ca ratios close to 4, it is obvious that a lower bias is obtained using the regression approach, and the bias (in the same way as the imprecision) increases as more peak points are included in the signal.
Thus, overall, using 3 detector pixels and using the regression approach is proposed as the most promising approach for deploying HR CS GFMAS for isotopic analysis of those situations in which the signal can be derived from two separate peaks.
![]() | ||
| Fig. 7 Time-integrated and 2D wavelength-resolved spectra of CaCl with different isotopic compositions obtained by HR CS GFMAS, normalized to the highest spectrum value, using conditions shown in Table 2. The total mass for both Ca and Cl in each spectrum is 1.8 µg. The first five peaks of each pure spectrum, obtained with a value close to 100% for each Ca and Cl isotope present, are identified as 1 to 5. | ||
This situation is clearly much more complex than the CaF investigated before, with so many peaks that can appear at different spectral wavelengths. Therefore, in this case, more powerful data processing approaches are required to extract the isotopic information.
The use of machine learning was hence investigated. An artificial neural network (ANN) was selected. ANNs are an excellent choice for regression problems because they can model complex, nonlinear relationships between inputs and continuous outputs. In addition, their implementation is highly portable across different machine learning environments and frameworks, enabling models to be trained, deployed, and reproduced consistently on diverse platforms.
To test the validity of the model developed, combinations with different abundances of isotopes 40Ca and 44Ca, together with isotopes 35Cl and 37Cl were prepared and measured, as described in Section 2.4. In total, 121 isotopic combinations were measured in quintuplicate on four different days. Due to data variability, each spectrum was blank-corrected and normalized to the maximum peak height. Of the different spectra obtained from each measurement (one approximately every 0.073 seconds), only the ones giving rise to the maximum signal (1, 3, or 5 spectra) were considered to minimize the influence of the noise. The differences observed when varying the number of spectra selected are minimal, but the results appear slightly better with 5 spectra; thus, only the results for this scenario will be reported.
Given the small size of the training dataset, a small model has also been selected to prevent overfitting. The model was developed using the MLPRegressor from the sklearn.neural_network library with the following hyperparameters:
(1) Preprocessing: each spectrum was blank-corrected and normalized to the maximum peak height.
(2) ANN architecture: inputs: 605. Hidden layers: [64, 32, 32, 16], solver = ‘adam’, LR = ‘constant’, learning_rate = ‘constant’, learning_rate_init = 0.005, alpha = 10 × 10−3, early_stopping = true, activation = 'relu’, maximum number of iterations = 10
000, alpha = 10 × 10−3. Other hyperparameters used the default library values.
To further reduce overfitting and improve accuracy and robustness, an ensemble of 50 ANNs was used. With this approach, the ensemble combines multiple perspectives, leading to more accurate and reliable predictions than any single model. In addition, the divergences among the models' predictions can be used to estimate the uncertainty of the generated output. In a regression problem, uncertainty can be measured by examining the distribution of predictions produced by the ensemble of models: the mean represents the final prediction, whereas the standard deviation across the ensemble quantifies predictive uncertainty, with a larger deviation indicating lower confidence in the prediction. Including uncertainty estimates increases trust and transparency in machine learning models. If, for a given input, very similar predictions are obtained across all models, this indicates that uncertainty is very low and, hence, the predictions are robust and reliable; conversely, a high uncertainty denotes that the predictions are not truly reliable.
A cross-validation approach has been followed for the training process. By repeatedly training and evaluating the model on different data splits, cross-validation reduces dependence on a single train–test split and helps detect overfitting. It also makes better use of limited data and yields performance estimates that are less biased and more representative of the model's performance on unseen data. Since we had 121 different experiments, we trained 121 ensembles of 50 ANNs, each trained on 120 of the 121 experiments and tested on the unseen experiment.
This is a demanding scenario, given that each input data point corresponds to a distinct combination of isotope abundances, and therefore, when one of them is removed, the ANN model has no information about that region of input values. The aim of this approach is to assess whether the model is capable of generalizing the information learned during training and applying it to make accurate predictions in unknown regions. Neural networks are generally effective at learning patterns that enable them to generalize well to unseen data when such data follow similar distributions and underlying rules to those in the training set. However, their performance can degrade when new data are highly noisy or governed by mechanisms different from those learned during training. With our experimental setup, we will be able to identify which regions of the analyzed input data exhibit regular behavior and are therefore well predicted by our models, and which exhibit anomalies, thereby degrading the quality of our predictions.
Since each experiment has been repeated five times, there are five predictions characterized by their mean value and standard deviation. These values can be easily combined by calculating their mean value, but slightly better results are obtained if the uncertainty represented by the standard deviation is incorporated to compute a weighted mean. Starting from the 5 measurements, each one with its mean (µ) and its standard deviation (σ), the weighted mean (µfinal) and the combined standard deviation (σfinal) are computed using the following equations:
Table 3 summarizes several relevant performance metrics for our MLP (multilayer perceptron) model. The regression model demonstrates strong overall performance, with a very high coefficient of determination (R2 = 0.9974), indicating that it explains nearly all of the variance in the target variable. The low mean absolute error (MAE = 0.0092) indicates that, on average, the model's predictions are very close to the observed values in absolute terms. While the mean absolute percentage error (MAPE = 22.7%) appears relatively high, the much lower median absolute percentage error (MdAPE = 1.21%) indicates that this elevated MAPE is likely driven by a small number of observations with large relative errors, rather than poor performance across most of the data. Overall, these metrics indicate that the model is accurate in most cases, with a few outliers skewing the percentage-based error measure. If the predictions for Ca and Cl are analyzed separately, the quality of the results is very similar for all metrics, but for MAPE (MAPE_Ca = 12.1%, MAPE_Cl = 33.2%). The higher MAPE for Cl compared to Ca, despite similar MdAPE values, indicates that a small number of Cl outliers have much larger relative errors, which disproportionately inflate the mean percentage error without affecting the typical (median) performance.
| Ca | Cl | Mean | |
|---|---|---|---|
| Mean absolute error (MAE) | 0.0090 | 0.0095 | 0.0092 |
| Mean absolute percentage error (MAPE) | 12.1% | 33.2% | 22.7% |
| Median absolute percentage error (MdAPE) | 1.20% | 1.22% | 1.21% |
| R2 (coefficient of determination) | 0.9973 | 0.9974 | 0.9974 |
Fig. 8 presents the final results, comparing the expected value with the model prediction. As the small MAE indicated, there is generally good agreement between the expected and predicted values. However, the prediction is not homogeneous. Extreme values, those with a low-abundance isotope and therefore appearing at the four edges of Fig. 8a, are much more difficult to predict efficiently. As a rule, when an isotope's abundance is 10% or lower, predictive capability declines significantly. This is certainly not unexpected, as the contribution of minor isotopes to the overall signal becomes less evident and can be masked by other sources of uncertainty (e.g., blank contribution, noise level). This is why outliers generate high mean MAPE values.
If a model works well most of the time but there are situations in which the error is very large, it is essential that the model itself be able to detect these situations and warn when its predictions are not reliable. As explained before, our model includes 50 different MLPs, and each experiment has been carried out five times. Hence, our prediction is constructed from 5 × 50 predictions, which allows us to calculate the standard deviation of both the models and the experiments and use it to detect predictions with high uncertainty. In fact, there is a high correlation (r2 = 0.882) between the coefficient of variation (CV, defined as the ratio of the standard deviation to the mean) and the MAPE. This can be observed in Fig. 8b, which presents both metrics for each prediction.
This correlation is useful, since it could serve to set limits, such that values exceeding a certain CV can be considered unreliable and investigated in greater detail and/or remeasured. For example, if a 0.05 CV threshold is used to select only the most reliable predictions, 100 of the 121 predictions will be selected, and their MAPE will be reduced from the original 22.7% to 2.5% only. This demonstrates the usefulness of enriching predictions with measures of uncertainty, such as the standard deviation or the CV, although in a real-world deployment, this threshold must be determined through an independent validation set to avoid over-optimistic results.
To further test the model once developed, additional samples measured on a different day were evaluated. These samples have never been seen by the model, providing more information about its performance with absolute unknowns.
The results are shown in Fig. 9a. Again, the same conclusions can be obtained. The MAE is again very small, on average 0.0098, but the MAPE is again high, on average 35.0%, due to the occurrence of outliers, especially in the lower-right corner. Since there are few measurements, the weight of these outliers is greater. However, these anomalous situations can be identified by the strong correlation between MAPE and CV, as shown in Fig. 9b.
Overall, these results demonstrate the potential of machine learning strategies to enable the simultaneous calculation of the isotopic abundances of the two elements comprising the molecule using high-resolution CS GFMAS. This feature is reported for the first time with this technique and is uncommon even when using other techniques more dedicated to isotopic analysis, although it has been reported before for LAMIS (e.g., Mao et al. reported the simultaneous isotopic measurements of hydrogen and oxygen with LAMIS).29 Nevertheless, the methodology shows some limitations, particularly when low-abundance isotopes are present, such that the accuracy of predictions for ratios above 8 or below 0.12 (for both elements) is significantly reduced. In these situations, it is particularly important to have well-calibrated uncertainty estimates, which allow us to identify when we can trust model predictions and when the margin of error is likely to be very high.
On the other hand, it is also shown that much more complex situations (CaCl: various peaks that appear at different wavelengths as a result of the varying isotopic composition of both Ca and Cl) can also be tackled by making use of machine learning, enabling the dual isotopic analysis of both elements giving rise to the target molecule, albeit with limitations in accuracy when one of the isotopes shows a low abundance. It is also worth noting that computing the prediction uncertainty can help flag potentially inaccurate results, thereby increasing the model's robustness.
It is also necessary to state that, while the previous literature demonstrates that isotope analysis via HR CS GFMAS is applicable to a variety of samples (biological, environmental, polymeric), this aspect was not investigated in the current work and, thus, should be further tested.
Supplementary information (SI) is available. See DOI: https://doi.org/10.1039/d6ja00062b.
| This journal is © The Royal Society of Chemistry 2026 |