Open Access Article
Homa Saeidfirozeh
*a,
Ashwin Kumar Myakalwar
b,
Pavlína Šeborová
a,
Ján Žabkaa,
Bernd Abel
ac,
Petr Kubelíka and
Martin Ferus
a
aJ. Heyrovský Institute of Physical Chemistry, Czech Academy of Sciences, Dolejškova 3, CZ 18223 Prague 8, Czech Republic. E-mail: homa.saeidfirouzeh@jh-inst.cas.cz
bDepartment of Physics, Faculty of Science and Technology (IcfaiTech), ICFAI Foundation Higher Education, Dontanpally, Hyderabad 501203, India
cFaculty of Chemistry and Mineralogy, Institute of Chemical Technology, Leipzig University, Linnéstraße 3, 04103 Leipzig, Germany
First published on 26th November 2025
The growing interest in sustainable space exploration has brought in situ resource utilization (ISRU) to the forefront of planetary science. This study presents an integrated approach to autonomous mineral identification for space mining by combining Laser-Induced Breakdown Spectroscopy (LIBS) with supervised machine learning (ML). A dataset of over 400 high-resolution LIBS spectra representing 25 mineral classes was collected under simulated low-pressure conditions to replicate extraterrestrial environments. The raw spectra were preprocessed using wavelet-based denoising to reduce random noise, baseline correction to remove the background continuum, and spectral normalization to account for intensity variations. To simplify the data and enhance classification performance, three feature selection methods were applied: Principal Component Analysis (PCA), which identifies directions of maximum variance to reduce data dimensionality; variance thresholding, which removes spectral features with negligible variability across samples; and random forest-based feature selection (RF-FS), which ranks wavelengths by their importance for classification. Several classification algorithms were evaluated, with test accuracies reaching up to 89.3%. The best results were achieved using random forest and logistic regression models trained on features selected by RF-FS, showing strong generalization to previously unseen samples. This work demonstrates the potential of LIBS-ML integration for fast, robust, and accurate mineral classification, including reliable identification of dominant phases in mineral mixtures in planetary environments. The approach also provides interpretability and classifier confidence estimation, supporting adaptive autonomous mineral identification for future robotic exploration missions.
By the late 19th century, the idea of utilizing resources from beyond Earth began to appear in speculative literature. One notable early example is Garrett P. Serviss's Edison's Conquest of Mars (1898), which describes the extraction of gold from asteroids, foreshadowing modern concepts of asteroid mining. By the 20th century, forward thinkers began imagining how we might utilize resources from beyond our planet. Konstantin Tsiolkovsky, a pioneering rocket scientist, set the groundwork for space travel and dreamed of humans expanding into space, supported by materials found on other worlds.3 Afterwards, in the mid-1900s, innovators like Arthur C. Clarke played a key role in popularizing these concepts by introducing them to the public through science fiction and futuristic ideas, inspiring many to imagine mining the Moon and asteroids,4 and laying the foundation for today's serious discussions about space resource utilization (SRU).5 These concepts evolved into institutional strategies emphasizing the importance of in situ resource utilization (ISRU), reducing reliance on Earth-supplied materials, and enabling affordable long-term space missions.6
Defining “space resources” has become increasingly critical: a material qualifies if it is present in a useful concentration, extractable with foreseeable technology, and serves practical space operations or markets. Recent efforts are underway to adapt terrestrial mineral classification standards, such as the Lunar Ore Reserves Standard (LORS-101),7 explicitly designed to categorize extraterrestrial deposits by feasibility and utility. Over recent decades, space resource mapping through remote sensing and sample analysis has progressed significantly.5 Agencies have identified promising concentrations of elements such as Fe, Ti, and Si, and extensive deposits of water ice in the Moon's polar regions, considered to be important for future propellant production.8–10 Despite promising orbital and remote sensing data, the actual composition, spatial distribution, and accessibility of these extraterrestrial resources remain uncertain. Reliable in situ measurements are therefore crucial for validating resource models and developing effective extraction strategies.11 Laser-Induced Breakdown Spectroscopy (LIBS) has emerged as a powerful tool for real-time geochemical analysis in space exploration.12 LIBS enables direct analysis of unpreprocessed, unpolished surfaces with almost any geometry, making it highly suitable for elemental characterization in extraterrestrial environments. While not directly involved in material extraction, LIBS plays a critical role in resource prospecting and compositional mapping, foundational steps toward the realization of extraterrestrial mining. Unlike traditional methods, LIBS operates effectively in low gravity and vacuum, providing real-time elemental composition analysis without the need for extensive sample preparation or complex instrumentation, making it a suitable analytical technique for different planetary missions.13–15 Despite these advantages, interpreting LIBS spectra under field conditions is challenging due to spectral complexity. Here, recent advances in Machine Learning (ML) offer a transformative approach. ML algorithms trained on known spectral signatures enable efficient classification of minerals from noisy or novel spectra, significantly improving speed and accuracy in situ. On Earth, LIBS has proven its versatility in analyzing diverse mineral and ore samples, including pyrite (FeS2), hematite (Fe2O3), molybdenite (MoS2), and chalcopyrite (CuFeS2), many containing economically valuable metals like copper, iron, zinc, and tungsten. This study features a mineral campaign including copper ores (azurite and malachite), iron ores (hematite and magnetite), and rarer materials like bauxite (aluminum ore) and wolframite (tungsten ore), simulating the diversity expected in extraterrestrial mining environments.
However, LIBS alone cannot efficiently handle the vast and complex spectral datasets generated during extraterrestrial mining. This challenge aligns with the broader scientific priorities highlighted by the Mars Sample Return initiative, which emphasizes the necessity for precise, rapid, and robust analytical techniques to exploit returned planetary samples6,16 fully. In this study, we focus on the ultraviolet (UV) spectral region under atmospheres of 10 and 10−2 mbar, closely simulating the low-pressure conditions encountered in space. ML provides a transformative solution, enabling the accurate classification and analysis of diverse mineral samples in real-time. Unlike deep learning methods that require large datasets, ML algorithms such as Random Forest (RF), Support Vector Machines (SVM), K-Nearest Neighbors (KNN), and Logistic Regression (LR) excel in small-data environments typical of space missions. These approaches effectively manage sparse, imbalanced data and noisy spectra, making them invaluable for on-the-fly decision-making in space resource extraction. Moreover, ML approaches are not limited to mineral classification; recent studies have successfully applied neural networks to predict plasma parameters directly from LIBS spectra, such as plasma temperature estimation using synthetic ChemCam-based simulations,17 and rapid detection of trace elements like xenon in complex plasma mixtures relevant for geochemical and planetary analyses.18
This study presents a robust LIBS-ML integration methodology that bridges Earth-based experiments with extraterrestrial resource exploration, addressing challenges like matrix effects, spectral noise, and small dataset variability. A key distinguishing feature of this work is its focus on mineral identification using LIBS spectra collected under planetary-like low-pressure conditions, as well as a careful evaluation of performance on complex mineral mixtures, which closely reflects the real-world scenarios of future space resource utilisation.
The paper is organized as follows. Sect. Materials and Methods, details the experimental methods and describes data processing and ML; sect. Results and Discussion presents the results and discussion; the next section discusses strategies to handle novel data and improve autonomous decision making in remote applications, and the last section concludes with key findings and future outlook.
| Material name | Metal/Material mined | Chemical formula | Mineral classification | Geological occurrence | Spectroscopic signatures | Ref. |
|---|---|---|---|---|---|---|
| Ores | ||||||
| Azurite | Copper | Cu3(CO3)2(OH)2 | Carbonate | Oxidized zones of copper deposits | Distinct Cu peaks at 324.8 nm and 327.4 nm | — |
| Bauxite | Aluminum | Al(OH)3 | Hydroxide | Lateritic soils in tropical regions | Broad Al peaks around 394.4 nm and 396.1 nm | — |
| Bismuth | Bismuth | Bi | Native element | Hydrothermal veins | Bi spectral line at 306.7 nm | — |
| Cassiterite | Tin | SnO2 | Oxide | Igneous and metamorphic rocks | Sn lines at 189.9 nm and 317.5 nm | — |
| Chalcopyrite | Copper | CuFeS2 | Sulfide | Hydrothermal veins, igneous rocks | Cu peaks at 324.8 nm and 327.4 nm | 19 |
| Chalcocite | Copper | Cu2S | Sulfide | Supergene enrichment zones | Cu peak at 324.8 nm | — |
| Chromite | Chromium | FeCr2O4 | Oxide | Ultramafic rocks | Cr peaks around 425.4 nm | 20 |
| Kyanite | Aluminum | Al2SiO5 | Silicate | Metamorphic rocks | Al peak at 396.1 nm | — |
| Galena | Lead | PbS | Sulfide | Hydrothermal veins | Pb peak at 220.3 nm | — |
| Goethite | Iron | FeO(OH) | Hydroxide | Secondary mineral in iron deposits | Fe peaks at 259.9 nm and 271.9 nm | 21 |
| Grossular | Aluminum | Ca3Al2(SiO4)3 | Silicate | Metamorphic rocks | Al peaks at 394.4 nm and 396.1 nm | — |
| Hematite | Iron | Fe2O3 | Oxide | Sedimentary and metamorphic rocks | Fe peaks at 259.9 nm and 372.0 nm | 21 |
| Magnetite | Iron | Fe3O4 | Oxide | Igneous and metamorphic rocks | Fe peak at 516.7 nm | 22 |
| Malachite | Copper | Cu2CO3(OH)2 | Carbonate | Oxidized zones of copper deposits | Cu peaks at 324.8 nm and 327.4 nm | 23 |
| Molybdenite | Molybdenum | MoS2 | Sulfide | Hydrothermal veins | Mo peaks at 390.3 nm and 386.4 nm | — |
| Pyrite | Sulfur | FeS2 | Sulfide | Sedimentary and hydrothermal deposits | Fe peaks at 259.9 nm and 371.9 nm | 24 |
| Sphalerite | Zinc | ZnS | Sulfide | Hydrothermal veins | Zn peak at 213.8 nm | — |
| Stibnite | Antimony | Sb2S3 | Sulfide | Hydrothermal veins | Strong Sb lines in the UV range | — |
| Wolframite | Tungsten | (Fe,Mn)WO4 | Tungstate | Hydrothermal veins | W peaks at 207.9 nm and 255.2 nm | — |
| Zircon | Zirconium | ZrSiO4 | Silicate | Igneous and metamorphic rocks | Zr peaks at 343.8 nm and 349.6 nm | 25 |
![]() |
||||||
| Rock-forming minerals | ||||||
| Olivine | Magnesium, iron | (Mg,Fe)2SiO4 | Silicate | Ultramafic rocks (peridotites, basalts) | Fe peaks at 516.7 nm, Mg lines in UV. | 26 |
| Gypsum | Calcium | CaSO4·2H2O | Sulfate | Sedimentary deposits, evaporites | Ca peaks at 393.3 nm, 396.8 nm | 27 |
| Feldspar | Aluminum, K, Na | (K,Na,Ca)AlSi3O8 | Silicate | Igneous/metamorphic rocks | Al peaks at 394.4 nm, 396.1 nm | 21 |
| Serpentine | Magnesium | (Mg,Fe)3Si2O5(OH)4 | Silicate | Metamorphic rocks (alteration of peridotite) | Mg peaks in UV, Fe peaks at 259.9 nm | 21 |
| Dolomite | Magnesium, calcium | CaMg(CO3)2 | Carbonate | Sedimentary rocks, hydrothermal veins | Ca peaks at 393.3 nm, Mg peaks in UV. | 28 |
Fig. 1 shows the surface of the bauxite sample as an example. The experiment was performed on this surface to enhance laser absorption, improve plasma formation, and reduce reflectivity and matrix effects.30 The dark lines visible in the marked area indicate the effect of laser ablation, where material removal and surface modification have occurred due to the interaction of the laser with the sample. To capture a representative analysis, multiple laser spots were applied across a broader area, capturing different microstructures within the sample matrix.
:
YAG laser (Nd-doped yttrium-aluminum garnet laser), which provides a pulse with a wavelength of 1064 nm, a duration of 6 ns, and an energy of 450 mJ. The repetition rate was set to 10 Hz. A CaF2 lens with a focal length of 10 cm was used to focus the laser beam on the samples attached to a moving stage located inside a vacuum chamber. The measurements were carried out under two different pressure conditions: 10 mbar and 10−2 mbar.Emission spectra of the laser ablation-induced plasma were recorded using the high-resolution Butterfly Echelle spectrograph, equipped with an Andor ICCD camera. The spectrograph operates in the UV (192–433 nm) region, offering spectral resolutions of 13–31 pm (with a resolving power of 14
000). The Echelle spectrograph was set to trigger 50 ns after the laser pulse and collect the signal for 1 µs. The final spectra were obtained by accumulating 20 laser shots. Before data collection, the Butterfly spectrograph was calibrated using a Hg lamp.
As the first step, careful filtering criteria were applied to improve LIBS data quality. These criteria focused on analyzing the standard deviation and noise distribution to exclude spectra with excessive noise or poorly defined peaks, which ensured that only high-quality spectra were included in the dataset. Of the 437 spectra across 25 ore and mineral families, 417 spectra fulfilled these criteria, resulting in a retention rate of 95.5%, which confirms that most of the data were preserved for analysis. A filtering efficiency of 4.5% represents the proportion of spectra removed due to low quality, demonstrating the effectiveness of the filtering process. Additional details on the preprocessing steps, including examples of excluded spectra with high noise or anomalously intense peaks, are provided in the SI.
Then, we analyzed the noise distribution across all spectra by calculating the standard deviation of the first 50 intensity points for each spectrum. As shown in Fig. 2, the normalized histogram of noise levels follows a Gaussian distribution, with most spectra exhibiting noise levels ranging between 1.1 and 2.3. The noise levels were normalized to a density, ensuring comparability across datasets and enabling a fitted normal distribution overlay. The mean noise level was calculated to be µ = 1.64, with a standard deviation of σ = 0.17, indicating that the noise distribution is tightly clustered around the mean. Spectra with noise levels below 1.06 or above 2.32 were rare, demonstrating the uniform quality of the dataset. A threshold of 1.49, derived as the 20th percentile of the noise levels, was used to identify spectra with low noise for subsequent analysis, ensuring robust preprocessing and consistent data quality.
![]() | ||
| Fig. 2 The histogram shows the noise level distribution across spectra, while the grey dashed line represents the fitted normal distribution. | ||
As the next step after noise analysis, we implemented wavelet denoising to improve spectral data quality by reducing noise while maintaining important signal features.32 This process involves three key steps. First, the signal x(t) is decomposed into wavelet coefficients ci,j and wavelet basis functions ψi,j(t) using the Daubechies-4 (db4) wavelet:
![]() | (1) |
Next, a soft thresholding function is applied to the wavelet coefficients to suppress noise while maintaining the significant components of the signal. The soft thresholding function used is defined as:33
![]() | (2) |
(t) is reconstructed by applying the inverse wavelet transform to the thresholded coefficients:
![]() | (3) |
This denoising approach was implemented using the PyWavelets library,34 with soft thresholding explicitly applied through the pywt.threshold function. The Daubechies-4 wavelet was selected for its optimal balance between resolution and smoothness, making it well-suited for LIBS spectral data.32 This process enhanced spectral quality by approximately 1.5 times, underscoring its effectiveness in improving data reliability and enabling more accurate downstream analyses (see the example in the SI).
For baseline correction, the process uses wavelet decomposition to separate the spectrum y(λ) into a low-frequency baseline Aj(λ) and high-frequency details Di(λ), isolating the baseline by setting Di(λ) = 0. The baseline is adjusted to ensure Aj(λ) ≤ y(λ), preventing overcorrection. The corrected spectrum is then calculated as ycorrected(λ) = y(λ) − Aj(λ). This approach effectively removes the baseline while preserving the spectral peaks, ensuring no negative values or artificial elevation in the corrected spectrum.
Then, each spectrum xi was normalized using:
![]() | (4) |
is the mean, and σj is the standard deviation for feature j across all samples.
It is worth noting that across all tested minerals, the spectra collected at different pressures showed strong correlations, with only minor intensity variations and no significant peak shifts or formation of new lines. Therefore, standard normalization and preprocessing are sufficient to combine or compare data at both pressures for mineral classification.
with K classes, stratified sampling aims to maintain the same probability distribution p(y = k) in both the training and test groups.For hyperparameter optimization and model selection, we further applied k-fold cross-validation within each training set. In this procedure, the training data are partitioned into k equally sized folds; each fold serves as a temporary validation set while the remaining k − 1 folds are used for training. This process is repeated k times, allowing every sample to be used for validation exactly once. The model's performance is then averaged across all k folds, yielding a robust and unbiased estimate of generalization accuracy. In this study, we used k = 5.
Fig. 3 shows the results of repeated random stratified splits: each row is an iteration, each column a sample, and blue cells mark test set assignments. This demonstrates that every sample is included in the test set throughout the cross-validation process.
To address the issue of class imbalance in our dataset, we applied the Synthetic Minority Oversampling Technique (SMOTE),36 which generates synthetic minority class samples by interpolating between existing minority instances:
| xnew = xi + δ × (xnn − xi), | (5) |
To improve robustness, we performed repeated random splits and implemented k-fold cross-validation, which provides a more reliable estimate of model performance. In k-fold cross-validation, the data are partitioned into k equal-sized subsets
. For each fold j, the model is trained on
and tested on
, cycling through all k folds. The average performance is calculated as:
![]() | (6) |
![]() | (7) |
| zk = Xwk | (8) |
Fig. 4A shows the distribution of explained variance across the principal components extracted from the LIBS data. The bars on the left axis indicate the individual contribution rate, that is, the fraction of total variance explained by each principal component, corresponding to the eigenvalues introduced above. The red curve displays the accumulated contribution rate as more components are included. As observed, the cumulative explained variance increases steeply, with the curve approaching 1.0 (100%) after relatively few components. This demonstrates that most of the spectral information in the LIBS data can be efficiently captured with a limited number of principal components. Rather than retaining all components, we focus on those that collectively account for at least 95% of the total variance, as this threshold effectively preserves the essential structure and chemical information present in LIBS spectra.31 To highlight the class differentiation achieved by these principal components, Fig. 4B projects LIBS spectra from five representative mineral classes onto the first two components (PC1 and PC2). The 95% confidence ellipses highlight distinct clustering, indicating that the variance captured in Fig. 4A translates into meaningful spectral separation. While the ellipses represent the main class clusters, the presence of points outside these boundaries is consistent with expected measurement variability and spectral complexity. Together, these panels confirm that PCA not only efficiently reduces dimensionality but also preserves class-specific information critical for accurate classification.
![]() | (9) |
![]() | (10) |
A significant advantage of our RF-FS approach is the interpretability it offers, as it identifies the most important spectral features used for classification. Fig. 5 provides a detailed validation of the RF-FS method used to identify the most informative wavelengths in LIBS spectral data. Fig. 5A illustrates the excellent match between the wavelengths selected by RF-FS and the known atomic emission lines from the NIST database,38 demonstrating that the model prioritises physically meaningful spectral features.
Fig. 5B shows the selected wavelengths overlaid on the average LIBS spectrum from all samples and mineral families. This illustrates how the RF-FS method effectively focuses on the prominent spectral peaks, which help distinguish between different minerals. Fig. 5C presents a zoomed spectrum of the averaged bauxite sample; as indicated in Table 1, bauxite is rich in Al, and the wavelength selected by the model at 394.40 nm corresponds to one of the most prominent Al emission lines. Meanwhile, the second line at 396.15 nm is also visible; both lines are key spectral markers widely used for aluminum detection in spectroscopic analysis.39 Moreover, panel D zooms in on gypsum, highlighting the important 396.83 nm Ca emission line identified by the model, which has been shown to reliably correlate with calcium concentration variations under different experimental conditions in remote LIBS analysis.40 Panel E showcases magnetite with the key Fe line at 358.11 nm.38 Moreover, as our spectral window is limited to the UV region, some of the most intense sodium (Na) and potassium (K) emission lines, such as the prominent Na doublet at 589 nm, fall outside the measured range. Nevertheless, the RF-FS method consistently selected alternative Na emission features present within the UV window, as detailed in the SI. The RF-FS method, applied across all tested classifiers, identifies the most important spectral features aligned with known elemental lines, thereby improving classification accuracy while making the models more interpretable.
| ŷ = mode({T1(x), T2(x),…,Tm(x)}) | (11) |
![]() | (12) |
| ŷ = mode({y(1), y(2),…,y(k)}) | (13) |
![]() | (14) |
These methods were chosen for their effectiveness and diversity in handling classification tasks. A detailed analysis and comparison of classifier performance are presented in the following section.
![]() | (15) |
is the indicator function.
![]() | (16) |
See the SI for details.
![]() | (17) |
![]() | (18) |
![]() | (19) |
These metrics were computed for each classifier and feature selection combination to provide a robust and multi-faceted evaluation of model performance. The results of these analyses, including detailed grid search optimization, classifier comparisons, and performance summaries, are presented in the following section.
000 variables per spectrum, corresponding to data points across the wavelength range of 187.19 to 425.85 nm. The dataset consists of 417 spectra across 25 different classes, forming a large data matrix of 417 rows (one for each spectrum) and 47
693 columns (each representing intensity at a specific wavelength). Handling such high-dimensional data can be computationally challenging. That's why efficient dimensionality reduction, along with careful preprocessing and noise reduction, is crucial for removing redundant information and enhancing the effectiveness of the analysis. In this matrix, each row captures the intensity profile of a single spectrum, while each column corresponds to the intensity at a particular wavelength. This organized format was then used as input for ML classifiers, which learned to recognize patterns and differences between the classes. By using intensity values as features, the models could directly extract meaningful insights from the LIBS spectral data, leading to reliable and accurate classification. As mentioned earlier, to address class imbalance, we employed the SMOTE technique.
Fig. 6 shows how SMOTE affects the RF model when using PCA for feature selection. Fig. 6A and B show the importance of principal components before and after applying SMOTE, highlighting how balancing the dataset shifts the relevance of different components. This indicates that SMOTE changes the underlying data structure, influencing which features are most informative for classification. Fig. 6C compares the distribution of the first principal component before and after applying SMOTE. Each dot in the violin plot represents the PC1 value for a single sample; the width of the violin at any point reflects how many samples have similar PC1 values (i.e., wider areas indicate higher sample density). In this context, skewness refers to the asymmetry of the distribution, with longer tails indicating more samples with extreme values. Before SMOTE, the distribution is wider with noticeable extreme values, or long tails, on both sides, indicating that the data are more spread out and skewed. This skewness reflects the imbalance and variability in the original dataset. After SMOTE, the distribution becomes narrower and more symmetric, with fewer extreme values and reduced skewness. This change suggests that SMOTE has effectively balanced the dataset by mitigating extreme variability and bias toward outlying values, leading to a more representative and stable feature distribution. Although the Kolmogorov–Smirnov test47 showed a non-significant statistical difference (p = 0.150), the observational evidence supports the positive impact of SMOTE on data balance. Finally, D presents a comparison of performance metrics, demonstrating clear improvements after applying SMOTE. The next step was to fine-tune the hyperparameters of each classifier to achieve the best possible predictive performance. This was done using a comprehensive grid search, a well-established method that systematically tests different model settings. The results are shown in Fig. 7, which displays grid search heatmaps for all four classifiers paired with PCA-based feature selection. Similar grid searches were performed for RF-based and VT feature selection; however, only the PCA results are presented here for clarity.
Fig. 7A shows how RF accuracy varies depending on the number of trees used in the ensemble (n_estimators) and the fraction of features considered at each decision point (max_features). The model performs best, often with test accuracy above 0.95, when both the number of trees and the feature proportion are set high, suggesting that a larger, more diverse ensemble leads to stronger and more reliable classification. This finding is in agreement with Sheng et al., who reported near-perfect classification accuracy for iron ore samples by optimizing these parameters in their RF models.48 Moreover, this improvement can be explained by the theoretical generalization error bound of RF introduced by Breiman:42
![]() | (20) |
Fig. 7B shows the classification accuracy of SVM models across different kernels and regularisation parameters (C). The linear kernel consistently achieves the highest accuracy around 0.96, across all tested C values, indicating that the LIBS spectral data are largely linearly separable in the original feature space. This is expected since LIBS spectra often contain prominent, distinctive peaks corresponding to elemental signatures, which can be effectively separated using linear decision boundaries. The RBF kernel yields moderately high accuracy (up to 0.91) but exhibits more variability, depending on parameter settings, suggesting a limited non-linear structure. The sigmoid kernel exhibits intermediate performance, while the polynomial kernel performs the worst, especially at lower C values, likely due to overfitting or a mismatch in model complexity with the data characteristics. Additional hyperparameter tuning results across different kernels and parameter grids are available in the SI.
Fig. 7C highlights the dependence of KNN classification accuracy on the number of neighbors (k) and the number of principal components retained after PCA. The best accuracy (≈0.94) is achieved with three neighbors and 100 principal components, suggesting that an optimal balance between dimensionality reduction and neighborhood size improves performance. The KNN classifier predicts the class of a sample based on majority voting among its k nearest neighbors in the PCA-transformed feature space:
![]() | (21) |
is the set of all possible classes, and
represents the set of indices of the k nearest neighbors of x in the PCA-transformed space. The indicator function 1(yi = c) equals 1 if the i-th neighbor's class label yi matches class c, and 0 otherwise.
Finally, Fig. 7D reports LR accuracy as a function of penalty type (L1 or L2) and regularization strength (C). Both penalty types achieve high accuracies, exceeding 0.96 for moderate values of C (0.1 to 1.0), demonstrating the effectiveness of these regularization strategies in mitigating overfitting in high-dimensional spectral data. Following hyperparameter tuning, the models were evaluated using various feature selection methods to assess their classification performance systematically.
Table 2, summarizes classification performance metrics for all four models and three feature selection methods, totaling twelve combinations. For each, results are reported separately for the test and training sets, along with the mean and standard deviation across the cross-validation folds. Among all methods, RF and LR combined with RF-based feature selection yielded the highest test accuracy, precision, recall, and F1 scores, all of which approached or exceeded 0.88. These results indicate that these models effectively classify mineral spectra, balancing false positives and false negatives well, and demonstrate robust generalization, as seen in the perfect training accuracy (reflecting model capacity) and slightly lower but reliable test performance. This pattern of near-perfect training accuracy coupled with slightly lower test accuracy, which reflects strong model capacity and reliable generalization, has also been observed in a spectroscopic classification study.49
| Model | Feature selection method | Parameter | Accuracy | Precision | Recall | F1 |
|---|---|---|---|---|---|---|
| a Note: PCA: Principal Component Analysis; RF-FS: RF Feature Selection; VT: Variance Threshold. | ||||||
| RF | PCA | Testing | 0.842 ± 0.041 | 0.891 ± 0.039 | 0.831 ± 0.051 | 0.834 ± 0.044 |
| Training | 1.00 | 1.00 | 1.00 | 1.00 | ||
| RF-FS | Testing | 0.886 ± 0.048 | 0.912 ± 0.039 | 0.867 ± 0.59 | 0.879 ± 0.050 | |
| Training | 1.00 | 1.00 | 1.00 | 1.00 | ||
| VT | Testing | 0.886 ± 0.031 | 0.89 ± 0.031 | 0.86 ± 0.041 | 0.878 ± 0.032 | |
| Training | 1.00 | 1.00 | 1.00 | 1.00 | ||
| SVM | PCA | Testing | 0.854 ± 0.032 | 0.872 ± 0.025 | 0.837 ± 0.0573 | 0.849 ± 0.036 |
| Training | 1.00 | 1.00 | 1.00 | 1.00 | ||
| RF-FS | Testing | 0.893 ± 0.039 | 0.871 ± 0.023 | 0.861 ± 0.048 | 0.883 ± 0.046 | |
| Training | 1.00 | 1.00 | 1.00 | 1.00 | ||
| VT | Testing | 0.833 ± 0.019 | 0.814 ± 0.032 | 0.804 ± 0.0405 | 0.819 ± 0.022 | |
| Training | 1.00 | 1.00 | 1.00 | 1.00 | ||
| KNN | PCA | Testing | 0.724 ± 0.053 | 0.762 ± 0.044 | 0.731 ± 0.040 | 0.707 ± 0.059 |
| Training | 1.00 | 1.00 | 1.00 | 1.00 | ||
| RF-FS | Testing | 0.862 ± 0.033 | 0.854 ± 0.0424 | 0.8323 ± 0.0479 | 0.851 ± 0.035 | |
| Training | 1.00 | 1.00 | 1.00 | 1.00 | ||
| VT | Testing | 0.614 ± 0.060 | 0.72 ± 0.034 | 0.67 ± 0.052 | 0.607 ± 0.061 | |
| Training | 1.00 | 1.00 | 1.00 | 1.00 | ||
| LR | PCA | Testing | 0.850 ± 0.030 | 0.878 ± 0.016 | 0.846 ± 0.048 | 0.841 ± 0.032 |
| Training | 1.00 | 1.00 | 1.00 | 1.00 | ||
| RF-FS | Testing | 0.891 ± 0.046 | 0.91 ± 0.033 | 0.874 ± 0.060 | 0.884 ± 0.051 | |
| Training | 1.00 | 1.00 | 1.00 | 1.00 | ||
| VT | Testing | 0.857 ± 0.025 | 0.841 ± 0.015 | 0.83 ± 0.036 | 0.847 ± 0.028 | |
| Training | 1.00 | 1.00 | 1.00 | 1.00 | ||
SVM also performed well, especially with RF-FS (test accuracy 0.893 ± 0.039), although it was slightly lower than RF. SVM with PCA also maintained strong, balanced metrics, demonstrating the utility of PCA for dimensionality reduction. KNN showed lower test accuracy and F1 scores when using PCA or VT (reaching 0.724 ± 0.053 and 0.614 ± 0.060, respectively). Still, performance improved substantially with RF-FS (accuracy 0.862 ± 0.033), suggesting that KNN benefits significantly from supervised feature selection in this context. VT as a feature selector generally gave lower scores across all models compared to PCA or RF-FS. This illustrates the importance of utilising methods that either leverage label information, such as RF-FS, or preserve the overall variance structure, like PCA, for this type of data. Also, precision, recall, and F1 generally follow the same pattern as accuracy: when a model has higher accuracy, it usually means it's also good at minimizing both false positives and false negatives. This leads to high F1 scores and shows that the models aren't favoring any one class over others.
The results in Table 2 are supported by Fig. 8, which compares balanced accuracy across models and feature selection methods. Balanced accuracy is significant for this multi-class, imbalanced LIBS dataset, as it reflects the average recall across all mineral classes and prevents performance from being dominated by the largest class. As shown in Fig. 8, RF and LR combined with RF-based feature selection consistently achieve the highest and most stable balanced accuracy, with median values at or above 0.90 and very low variability, aligning with their strong performance in accuracy, precision, recall, and F1 scores reported in Table 2. SVM, especially when combined with RF-FS or PCA, also performs well, although it typically yields results below those of RF and LR. By contrast, KNN shows lower and more variable balanced accuracy with unsupervised selection (VT), but improves substantially when paired with RF-FS, highlighting the importance of label-informed feature selection. Models using VT alone tend to underperform, indicating that relying solely on global variance is insufficient for identifying informative spectral features. These findings underscore the value of supervised feature selection, especially RF-FS, for maximizing the generalizability and robustness of classification models in challenging, imbalanced mineral datasets.Fig. 9 displays the normalized confusion matrices for each classifier using RF-based feature selection, providing a granular view of class-level prediction performance. In these matrices, each row corresponds to the true mineral class, and each column to the predicted class. The values along the main diagonal (from top left to bottom right) represent the proportion of samples that were correctly classified for each mineral class; higher diagonal values indicate stronger model performance for those classes. Off-diagonal values, by contrast, indicate misclassifications where samples are incorrectly assigned to another class. Ideally, a perfect classifier would produce a matrix with all values on the diagonal and zeros elsewhere, while off-diagonal entries signal which minerals are most frequently confused. Fig. 9A confirms that RF achieves the most consistent and accurate mineral identification, with nearly all samples assigned to their correct class, reflected in minimal off-diagonal errors. Fig. 9B shows SVM, which retains a dominant diagonal but exhibits more class confusion than RF, especially for spectrally similar minerals, revealing specific class pairs that remain challenging to separate. Fig. 9C, for KNN, highlights more frequent misclassifications, particularly among classes with overlapping features, illustrating KNN's sensitivity to local variations and validating its comparatively lower balanced accuracy. Fig. 9D, LR, demonstrates performance close to RF, with most predictions along the diagonal and only occasional confusion between certain classes. Together, these confusion matrices not only validate the high overall accuracy of RF and LR with supervised feature selection but also pinpoint specific mineral classes where misclassification persists, providing actionable insight for refining future models and experimental design.
Fig. 10 shows the average Receiver Operating Characteristic (ROC) curves for each classifier and feature selection method, computed by averaging results over 10 random stratified splits. In these plots, the true positive rate (sensitivity) is plotted against the false positive rate (1-specificity) for varying classification thresholds. The proximity of the curve to the upper-left corner indicates stronger overall performance.
Each row of panels corresponds to a different classifier: Fig. 10A presents RF, Fig. 10B SVM, Fig. 10C KNN, and Fig. 10D LR. For RF, the ROC curves are consistently closest to the ideal point, with high area under the curve (AUC) values across all feature selection methods, reaffirming its robust discrimination ability observed in previous metrics and confusion matrices. SVM shows strong performance, particularly with RF-based feature selection, though with slightly more variability than that of RF. KNN exhibits noticeably flatter ROC curves, indicating weaker class separation and lower overall sensitivity, which aligns with its lower test accuracy and increased off-diagonal confusion. LR performs comparably to RF, especially when paired with supervised feature selection, demonstrating a high AUC and reliable classification boundaries.
These results provide consistent evidence of classifier performance across multiple evaluation criteria. Accuracy and F1 score, previously defined, measure overall correctness and balance between precision and recall, respectively.
The confusion matrix visually complements these metrics, with a strong diagonal indicating high true positive rates and minimal misclassifications. To further generalize classifier evaluation, ROC curves plot the true positive rate (TPR) against the false positive rate (FPR) across thresholds, where:
![]() | (22) |
The area under the ROC curve (AUC) summarizes performance independent of threshold choice. As seen in Table 2, RF and LR with RF-based feature selection achieve the highest test accuracy, F1 score, and recall, as reflected in the confusion matrices, where nearly all predictions are on the diagonal, and the ROC curves approach an AUC of 1.0, indicating excellent discrimination. In contrast, models with lower accuracy and F1 scores, such as KNN with unsupervised feature selection, exhibit increased confusion and flatter ROC curves, indicating higher misclassification rates.
Overall, the agreement across accuracy, F1, confusion matrices, and ROC/AUC, grounded in their mathematical definitions, robustly validates the superior performance of RF and LR for multiclass LIBS mineral classification. The goal of supervised pattern recognition is to use samples with known classes as a training set to build a model that can accurately predict the class of unknown samples. To achieve this, we first trained and validated our classifiers to achieve high performance on known data. To further evaluate their robustness and generalization, we then tested the best-performing models on 12 completely unseen spectra, which were randomly selected using a stratified sampling approach from the entire set of measured spectra. These new samples were processed with the same preprocessing steps as the training and testing data to maintain consistency. The models correctly classified 10 to 11 out of 12 samples, achieving an accuracy of approximately 83% to 92%, demonstrating strong predictive capability beyond the original dataset. Fig. 11 shows a comparison between the spectrum of an unseen bismuth sample and the mean spectrum of the predicted bismuth family, as classified by the LR model with RF-based feature selection. The close alignment of key spectral peaks between the individual sample and the family mean spectrum highlights the model's ability to generalize and classify new, unseen spectra accurately. This visual confirmation supports the quantitative classification results, demonstrating the robustness of this approach for real-world mineral identification.This strong performance on previously unseen pure mineral samples demonstrates the practical potential of our approach for autonomous mineral identification. However, planetary materials and terrestrial soils are rarely pure phases and are often complex mixtures of several minerals. Therefore, to further test the robustness and interpretability of our models under realistic conditions, we systematically evaluated classifier performance on synthetic binary mixtures, as described below.
![]() | ||
| Fig. 11 Comparison of an unseen bismuth sample spectrum (green) with the mean predicted family spectrum (grey) classified by LR + RF-FS, shown here as an example of model performance on new data. | ||
| Smix(λ) = w1S1(λ) + w2S2(λ) | (23) |
Synthetic mixture spectra, together with pure hematite and gypsum spectra measured at 10 mbar, are shown in Fig. 12. Progressive changes in spectral features with varying composition indicate compositional sensitivity and experimental relevance of the mixtures. For the mixture classification analysis, we report results using the LR-FS model, identified as one of the best-performing models in this study (see Table 2). Each synthetic mixture was classified with this model, and the probability assigned to hematite, P(Hematite), was recorded. To quantitatively analyze the classifier's response to these mixtures, the following metrics were evaluated.
First, the relationship between the predicted probability PHematite and its fraction in the mixture fHematite was described using a sigmoid function:
![]() | (24) |
Then, to evaluate prediction confidence, the entropy H of the predicted probability distribution across all mineral classes was computed for each mixture:
![]() | (25) |
Finally, the ability to identify the dominant mineral in each mixture was assessed by constructing an ROC curve, using the probability assigned to hematite (PHematite) as the score and the dominant mineral as ground truth. The AUC provides a summary measure of discriminative performance for mixed samples.
Fig. 13 shows how our best classifier (LR + RF-FS) performs on synthetic mixtures of hematite and gypsum, with clear evidence of both accuracy and reliability in mixed-mineral samples. Panel A shows the relationship between the predicted probability for hematite and its actual fraction in the mixture. Each dot is a single synthetic mixture (created by mixing hematite and gypsum spectra in different amounts), and the red dashed line is a sigmoid curve fitted to the data. The “switch point” of the curve is at about 0.75, meaning that the model begins to call the mixture “hematite” when hematite is roughly 75% of the sample. This curve confirms that the model responds in a logical and gradual way to changing mineral composition, rather than making random or abrupt jumps. Panel B shows the same probability values, but now the color of each point represents the prediction entropy, which measures the classifier's confidence. When the mixture is nearly all hematite or all gypsum (far left or right on the x-axis), the classifier is very confident (entropy is low, brown/yellow points). The highest uncertainty (blue points) is found near the middle, where hematite and gypsum are in similar amounts, making the classification harder. Panel C displays the ROC curve, which evaluates how well the classifier can identify the dominant mineral in each mixture using the predicted probability for hematite. The curve is very close to the top left corner, and the AUC is 0.98, indicating overall high accuracy, particularly when one mineral dominates. However, as seen in Fig. 13A, the classifier's predictions are less reliable for intermediate mixtures where the proportions of hematite and gypsum are similar. The model also provides interpretable confidence estimates. This capability is especially important for real-world applications in soils, rocks, and planetary materials, where mixtures are common and confident decisions are required.
However, true deployment in planetary missions brings further challenges, such as the need to recognize minerals not present in the training library and to support autonomous, onboard decision-making. These aspects are addressed in the following section.
Supplementary information (SI) is available. See DOI: https://doi.org/10.1039/d5ja00377f.
| This journal is © The Royal Society of Chemistry 2026 |