Open Access Article
Katharina Kronenberg
*a,
Hennes Rave
b,
Nassim Ghaffari-Tabrizi-Wizsy
c,
Danae Nyckees
d,
Matthias Elinkmann
a,
Dalial Freitak
d,
Lars Linsen
b,
Raquel Gonzalez de Vega
e and
David Clases
a
aNanoMicroLab, Institute of Chemistry, University of Graz, Universitätsplatz 1, 8010 Graz, Austria. E-mail: katharina.kronenberg@uni-graz.at
bInstitute of Informatics, University of Münster, Einsteinstraße 62, 48149 Münster, Germany
cOtto Loewi Research Center, Division of Immunology, Medical University of Graz, Neue Stiftingtalstraße 6, Graz, 8010, Austria
dInstitute for Biology, University of Graz, Universitätsplatz 2, Graz, 8010, Austria
eTESLA-Analytical Chemistry, Institute of Chemistry, University of Graz, Universitätsplatz 1, 8010 Graz, Austria
First published on 19th September 2025
Spectral imaging generates information-rich datasets comprising a large map of pixels that each contain a comprehensive spectrum. A specific form of mass spectral imaging is laser ablation-inductively coupled plasma-time-of-flight mass spectrometry (LA-ICP-TOFMS). This technique enables elemental imaging of almost the entire periodic table. The large number of isotopes per pixel leads to high-dimensional data posing major challenges for visualisation, pattern recognition and interpretation. To decrease this complexity, dimensionality reduction techniques, such as uniform manifold approximation and projection (UMAP), provide powerful tools to transform high-dimensional datasets into low-dimensional representations aiming to preserve data point relationships and visualise spectral similarities. This study provides a detailed introduction to UMAP for analysing LA-ICP-TOFMS data. By transforming high-dimensional MS imaging data into two-dimensional spaces, UMAP facilitates automated visualisation to identify spectral clusters. UMAP's utility to reveal spectrally distinct regions and tissue heterogeneity is demonstrated for a chicken embryo and a honeybee specimen. For detailed cluster analysis, a hierarchical strategy is introduced involving iterative UMAP applications, first to the global dataset, and then to resulting clusters. This approach helps uncover subtle chemical patterns hidden in the initial global UMAP application. Furthermore, the influence of the most relevant UMAP hyper-parameters is discussed, providing guidance for selecting critical parameters for further datasets. Overall, this study introduces UMAP as an exploratory and versatile tool for targeted and non-targeted analysis of complex LA-ICP-TOFMS data. Its integration into imaging workflows supports spectral clustering, image segmentation, hypothesis generation, and rapid analysis of large and high-dimensional spectral data from biological and environmental specimens.
Traditionally, LA-ICP-MS data has mostly been examined manually. This involves a systematic but manual navigation through images of individual m/z to identify elemental or spatial patterns.9 However, with the advancement of ICP-TOFMS, it becomes increasingly impractical and less efficient as dataset complexity and information density grow. When employing the full potential of a TOF-based instrument, the available mass range easily exceeds 200 m/z and an exhaustive manual analysis is no longer feasible in practice. This is especially the case in non-targeted analyses, where no a priori knowledge or expectation exists and increasing dataset complexity greatly amplifies the risk of overlooking relevant spatial and spectral features. In particular, it is challenging to segment spatial regions with distinct multi-elemental signatures or to visualise and understand overarching patterns in the data. A more effective strategy to identify spatial and elemental patterns should consider several m/z jointly.
To solve this challenge, the structure of the data needs to be considered to derive better strategies to handle and visualise complex data. In LA-ICP-TOFMS, the intensity of each recorded m/z represents a distinct dimension. In an exemplary case of 200 recorded m/z, each pixel contains an intensity value for all 200 m/z and is positioned accordingly in a 200-dimensional scatterplot. This represents the elemental image in a high-dimensional space. Recently, powerful dimensionality reduction algorithms have been introduced that are ideally suited for such datasets. A relevant selection is presented in the following. A glossary, containing definitions of key terms used in dimensionality reduction, can be found at the end of this article.
PCA transforms a high-dimensional dataset with potentially correlated variables into a set of new uncorrelated variables, known as principal components. Principle components are linear combinations of the initial variables.17 By identifying the directions (principal axes) along which the variance of the data is maximised, PCA projects the original data onto a lower-dimensional space with new orthogonal axes while preserving as much of the original variability as possible. As such, PCA is a linear and deterministic dimensionality reduction technique.18
t-SNE, on the other hand, reduces dimensionality in a non-linear way. The main objective of t-SNE is to preserve the local structure of high-dimensional data during the transformation into a low-dimensional space. In this context, ‘preserving local structure’ means that t-SNE aims to ensure that data points close to each other in the original high-dimensional space remain close in the low-dimensional embedding, thereby maintaining immediate neighbourhood relationships. To achieve this, t-SNE first calculates how similar each data point is to every other point in the high-dimensional space, expressing these distances between data points as probabilities that reflect similarities.19 It then attempts to create a similar probability-based representation in the low-dimensional space, where the arrangement of points reflects the same local relationships. The algorithm minimises the difference (Kullback–Leibler divergence) between these two sets of probabilities using an optimisation method known as gradient descent. This process helps t-SNE to maintain the relative distances between nearby points while embedding the data into a low-dimensional space that is easier to interpret visually. It is important to note that the axes generated in the low-dimensional space are not interpretable. They represent abstract dimensions that have been optimised for visualisation, rather than the original data features. Additionally, it is also important to acknowledge that a stochastic algorithm is employed. Consequently, multiple executions may yield a different low-dimensional embedding, unless the random seed is fixed within the algorithm. t-SNE is particularly effective for uncovering patterns in large and complex datasets, allowing it to capture non-linear structures and clusters that linear techniques like PCA cannot. By focusing on preserving the immediate neighbourhood of each point, rather than the global arrangement of all points, t-SNE provides an effective way to explore the underlying structure of high-dimensional data. The underlying mathematical details are discussed by van der Maaten and Hinton.19
In 2018, UMAP was introduced as a new non-linear dimensionality reduction algorithm by McInnes et al. and has been gaining increasing attention over the past years.20 UMAP and t-SNE are very similar in their functionality. Both methods are stochastic and reduce high-dimensional data by embedding it into a low-dimensional, visually interpretable space, while aiming to preserve the structure of the original data. However, UMAP is frequently regarded as a more effective method due to several advantages.21–23 Literature demonstrating and explaining UMAP compared to t-SNE can be found elsewhere.20,21,24 The UMAP algorithm is based on a distinct theoretical foundation with several improvements over, e.g., t-SNE. First UMAP provides better scalability and computational speed especially for datasets with a large number of data points (i.e., the number of pixels in MSI) and dimensions (i.e., the number of m/z in MSI). Second, UMAP offers improved preservation of the global structure of the original high-dimensional data while also maintaining the local structure. Thus, the algorithm performs better in balancing between local and global structure.24 As a consequence, the low-dimensional embedding generally visualises a large amount of meaningful information hidden in the original data. However, it is worth to emphasise that, like t-SNE, UMAP inherently distorts the high-dimensional structure of original data during the transformation into the low-dimensional space. Consequently, the axes or distances between clusters in lower dimensions are not quantitatively interpretable, particularly in comparison to linear techniques such as PCA. An in-depth introduction to the UMAP algorithm, explaining its operational mechanisms, application methodologies, and interpretation, is provided in the recent publication by Healy and McInnes, recommended for those starting to use UMAP.25 Those seeking to develop an intuitive understanding of UMAP are also directed to the UMAP reference documentation,26 Understanding UMAP by Google,24 or view the instructional videos “UMAP explained”27 as short introduction or “UMAP: Main Ideas”28 and “UMAP: Mathematical Details”29 by StatQuest. Readers interested in the mathematical foundation of UMAP are referred to the original article by McInnes et al.20 The following sections will only address the most essential aspects of UMAP necessary for applying it to LA-ICP-MS data. With regard to terminology, note that the structural shape of high-dimensional data is also referred to as topology.
The parameter min_dist influences the second algorithm stage of optimising the low-dimensional embedding and describes the minimum distance between embedded points. It controls how densely points within clusters are arranged in the low-dimensional embedding, thereby shaping the density and visual appearance of the final output embedding. The value for min_dist ranges from 0 to 1. Low values lead to tightly packed clusters probably providing a more accurate representation of the topological structure. Large values result in spreading points further apart.25
To date, the utilisation of UMAP has been extensively implemented across numerous scientific domains, including single cell biology,21 transcriptomics30 and population genetics.31 In the context of elemental mass spectrometry, the use of UMAP was described for analysing single cells labelled with metal-conjugated antibodies,32,33 as well as for molecular MSI.16,34 This study, however, demonstrates the application of UMAP for dimensionality reduction and subsequent image segmentation of data obtained by label-free elemental MSI data. To illustrate the utility of UMAP, thin sections from two exemplary biological specimens, a chicken embryo and a honeybee, containing various and complex biological structures, were analysed using LA-ICP-TOFMS.
A group of 30 worker honeybees (Apis mellifera carnica) were fed ad libitum with 50% sugar solution and were kept in an incubator at 34 °C and 70% humidity. After six days, the bees were frozen at −20 °C until cryosectioning.
The honeybee was cryosectioned in a sagittal plane with a thickness of 50 μm and employing a chamber temperature and object temperature of −20 °C and −17 °C, respectively. The bee was cut in half and then the inner cavities of the bee were filled with CryoGlue (SLEE medical GmbH, Nieder-Olm, Germany). This procedure was implemented to ensure an improvement in the integrity and quality of the sections. Otherwise, the sections exhibited a high degree of brittleness and structural distortion. The resulting sections were mounted on glass slides, thoroughly dried, and stored frozen under a N2 atmosphere until LA-ICP-TOFMS analysis was conducted.
000 pixels, n_neighbours = 5, and the landmark-based approach with a subsampling of pixels set to 0.1, a UMAP embedding was calculated within 3 min and 30 s. Data points in UMAP embeddings are displayed with 20% opacity to visualise density. Mass spectra can be interactively visualised within the Multiscale Image Analysis Software. In this study, all displayed mass spectra were averaged over the corresponding image segment or cluster and exported as CSV file. For figure preparation, the exported spectra were subsequently imported into OriginPro 2024 (OriginLab Corporation, Northampton, MA, USA).
To obtain a more detailed understanding of the UMAP results, a closer investigation was conducted on the underlying mean mass spectra of three clusters labelled C1 (light blue, tissue), C2 (dark green, bone), and C3 (light red, heart ventricle) in Fig. 1(B1). A figure demonstrating the mean mass spectra of all clusters is presented in the SI, Fig. S1. Examining mass spectra provided insight into interesting trends or differences regarding the elemental composition underlying each cluster. The mean mass spectra of clusters C1–C3 are displayed in Fig. 1(C), with each cluster exhibiting a unique elemental profile. Note that two blanker regions (m/z 25.0–41.5 and 55.2 to 64.6) were included in the recorded mass range in the ICP-TOFMS method. Ions with m/z within these ranges were scattered to prevent high ion currents from reaching the detector. This is reflected in low signal intensities for all m/z in blanker regions. To understand the spectral differences between the clusters C1, C2 and C3, the corresponding mean mass spectra were examined. For each cluster, the dominant m/z values were selected and displayed as images in Fig. 1(D1–D3). This reveals, for example, that in the dark green cluster C2, corresponding to the bone, the 44Ca signal is more intense compared to the heart ventricle or the remaining tissue. This is confirmed when selecting 44Ca for image display (Fig. 1(D2)). In contrast, the heart ventricle shows the highest 54Fe signal compared to cluster C1 and C2 in the mass spectra. 54Fe was monitored instead of the main isotope 56Fe, as the latter was intentionally blanked due to exceeding the acceptable intensity levels of the detector. When selecting 54Fe for image display (Fig. 1(D3)), the spatial distribution demonstrates high Fe accumulation in the heart ventricle relative to the bone and the remaining tissue regions. In addition, the liver also exhibits elevated Fe levels. However, in the UMAP embedding, the liver is separated from the heart ventricle based on differences in multiple other elemental signals. This highlights the advantage of UMAP's ability to incorporate the full spectral information from each pixel, rather than just interpreting single isotopic images. A further drawback of inspecting single isotopic images by eye is that the recognition of clusters strongly depends on the chosen colour scale and its contrast. For instance, the 44Ca image is dominated by the intense signal of the bone, and usually imaging software by default adjust its colour scale to that region. However, lowering the maximum value of the scale reveals that Ca is distributed across the entire image (Fig. S2 in the SI). UMAP, in contrast, considers all intensity values simultaneously and therefore avoids overlooking such distributions that may remain hidden to the human eye due to colour scale settings. The advantage of UMAP is that a preselection or inspection of the dataset is not required. Instead, all m/z values can be included directly, and the results can be evaluated afterwards. As such, UMAP is a valuable tool for exploratory analysis, enabling rapid assessment of spectral similarities and differences within the dataset. It is particularly useful for non-targeted investigations in which tissue sections are analysed without a priori knowledge or expectation. However, it can also support hypothesis-driven research by facilitating comparisons between known spatial structures and data-driven clustering results as demonstrated in Fig. 2. Here, regions of interest (ROIs) in the image were defined based on known biological structures within the tissue (Fig. 2(A1 and A2)). These ROI segments were then transferred onto the UMAP embedding with corresponding colours (Fig. 2(A3)), enabling interactive exploration of whether anatomically defined regions were also clustered based on their spectral profiles. Conversely, segmentation can also be performed directly on the UMAP embedding, with the resulting clusters mapped back onto the spatial image to assess their anatomical relevance (Fig. 2(B1–B3)). This bidirectional approach facilitates the validation of expected spatial patterns and supports the identification of unexpected spectral heterogeneity. In short, UMAP provides a straightforward overview to visualise spatial structures based on their unique elemental composition obtained by LA-ICP-TOFMS.
While this example demonstrates the utility of UMAP in uncovering meaningful spectral patterns and their spatial organisation, the quality and interpretability of the resulting UMAP embeddings are highly dependent on the choice of UMAP parameters. Therefore, consideration must be given to selecting appropriate values particularly for key parameters such as n_neighbours and min_dist to ensure that both local and global data structure is adequately preserved. Here, a notable benefit is the employed landmark-based UMAP approach with its high computational speed. It was employed to facilitate rapid testing of various hyper-parameter settings to empirically identify a UMAP embedding that aligns with the desired visualisation of the data.
Fig. 3 illustrates the impact of the UMAP hyper-parameter selection on the resulting embeddings of the chicken embryo dataset. The parameter defining the number of nearest neighbours (n_neighbours) was systematically varied from 5 to 100, while min_dist was held constant at 0.1 (Fig. 3(A1–A6)). As outlined in the introduction, lower n_neighbours values emphasise the preservation of local data structure, which may lead to an overrepresentation of minor variations. In this dataset, increasing n_neighbours from 5 to 20 improved the definition and separation of clusters (Fig. 3(A1–A4)). However, further increases to 50 and 100 (Fig. 3(A5 and A6)) resulted in slightly reduced cluster separation, likely due to the connection of too many neighbouring data points. Since well-separated clusters are essential for the intended visualisation, n_neighbours = 20 was selected as the optimal setting for this dataset.
![]() | ||
| Fig. 3 Effect of varying UMAP hyper-parameters n_neighbours and min_dist on the embedding (related to Fig. 1). n_neighbours is varied from 5 to 100, with min_dist fixed at 0.1 (A1–A6). min_dist is varied from 0 to 1, with n_neighbours fixed at 20 (B1–B6). Additional input parameters included all recorded m/z and all pixels of the ablated area shown in Fig. 1(A2). | ||
In addition to n_neighbours, the parameter, which defines the minimum distance between points in the low-dimensional embedding (min_dist), was varied between 0 and 1 to assess its influence on the UMAP embedding (Fig. 3(B1–B6)). Lower min_dist values result in more tightly packed points, enhancing the visual separation of clusters. Conversely, higher values produce more dispersed embeddings, which can improve global structure representation at the expense of local detail. While compact grouping can aid in cluster identification, excessively small min_dist values may lead to overplotting, obscuring point density and reducing the interpretability of dense regions. Based on these observations, the final parameters for the chicken embryo dataset were selected to be n_neighbours = 20 and min_dist = 0.1 (Fig. 3(A4)). As an example that linear dimensionality techniques such as PCA fail to represent the complexity of the data structure, a comparison of UMAP with optimised hyper-parameters and PCA is presented in Fig. S3 in the SI.
The data analysis starts with an initial UMAP embedding (UMAP1) of the full dataset, including all pixels from the ablated area and the complete set of recorded m/z values (Fig. 4(A1)). Following the previously outlined exploration strategy, the lasso selection tool was used to identify major clusters in the UMAP1 embedding (Fig. 4(A2)), which correspond to distinct anatomical structures in the honeybee, as visualised in the clustered LA-ICP-TOFMS image (Fig. 4(A3)). These include the flight muscle (green), the crop also known as honey stomach (blue), the midgut (red), and the remaining tissue (orange). Additionally, the surrounding cryo-embedding medium (grey clusters), which covered the specimen and filled vacant spaces within, was successfully separated from biological tissue. Vertical stripe patterns, highlighted in black, are also visible in the clustered image and grouped together in the UMAP embedding. These features represent auto blank events which were triggered during data acquisition when signal intensities exceeded a certain threshold. During these events, ions in small mass ranges around the high intensity m/z were automatically scattered by the Bradbury–Nielson gate in the ICP-TOFMS instrument. After a brief interval, the blanking is deactivated, and the full mass range is once again acquired. Consequently, m/z images within the affected mass region display rows of missing data along the ablation direction. As UMAP is sensitive to variations in signal intensity across all selected m/z values, it effectively identifies and clusters these artefactual features, thereby facilitating their recognition and potential removal during data analysis. This data-dependent recognition of artefacts, such as auto blank events, provides a useful tool to mask compromised pixels for downstream analyses or quantification.
In a second step, the orange cluster from the UMAP1 embedding was isolated and processed once again with UMAP resulting in a new UMAP embedding (UMAP2, Fig. 4(B1)), revealing finer substructures within this region. Distinct subclusters emerge, including a yellow cluster that corresponds to the bee's brain and a brown cluster likely associated with glandular tissue (Fig. 4(B2 and B4)). These subclusters can be retrospectively visualised in the context of the initial embedding by overlaying their segmentation on the UMAP1 embedding (Fig. 4(B3)), illustrating how these finer structures were previously unresolved.
A similar hierarchical analysis was applied to the crop region (blue cluster in UMAP1). In this case, only the 14 most intense and artefact-free m/z signals from the crop cluster were selected as input for UMAP2. This improved the separation of internal substructures compared to using the full spectral range (Fig. 4(C1)). The resulting subclusters reveal the crop wall and further blanking events (Fig. 4(C2 and C4)), exhibiting a strong correlation with spatial patterns observed in multiple single-element ion images (see Fig. S4 in the SI). This procedure represents a targeted application of UMAP where input parameter settings are refined until a desired spatial structure forms a cluster in the UMAP embedding. A targeted UMAP analysis is particularly useful in cases where substructures are visible in individual m/z images but difficult to segment by manually drawing ROIs. By restricting the input to relevant m/z values, the UMAP embedding can be refined to enhance the grouping of pixels of these specific regions. This facilitates the detection of subtle substructures compared to embeddings generated using the full m/z range. Details on the targeted subclustering of the crop and the selected m/z are provided in the SI (Fig. S4). Once more, these subclusters may be retrospectively visualised in the context of the initial embedding by superimposing their segmentation on the UMAP1 embedding (see Fig. 4(C3)). The mean mass spectra of the final image segments are presented in Fig. S5 in the SI.
The use of a targeted UMAP analysis workflow can be particularly advantageous in scenarios where the image segmentation of known spatial ROIs is desired but manual segmentation is impractical and tedious. The fine structure in the crop of the honeybee demonstrates a visual example of this scenario. Other exemplary cases would be the segmentation of individual nanoparticles that are dispersed throughout the entire ablated area, inclusions in geological samples or biological tissues with a complex spatial structure, for example lung or cancer tissue. In such instances, the parameters as well as the spatial and spectral input for UMAP can be adjusted until desired ROIs can be readily segmented via the UMAP embedding rather than via the image directly.
These exemplary applications of hierarchical UMAP demonstrate its potential to uncover biologically meaningful substructures within spatially complex and highly dimensional imaging datasets. By combining non-targeted and targeted UMAP analyses, both broad tissue architecture and subtle, compositionally distinct regions can be resolved with minimal manual image segmentation. UMAP is frequently used in single cell biology, for example for the analysis of high-dimensional data arising from antibody staining techniques such as flow cytometry or imaging mass cytometry.32,33 These applications showcase the ability of UMAP to uncover cellular heterogeneity based on antibody staining. However, as demonstrated in this study, the potential of UMAP extends well beyond these established use cases. Its application to label-free LA-ICP-TOFMS imaging data rather than data obtained from metal-conjugated antibody staining demonstrates that UMAP can also effectively distinguish more subtle elemental profiles within tissues and discern larger biological morphologies.
Moreover, it is important to consider the impact of elements with multiple isotopes in UMAP analysis. A potential concern is that the inclusion of all isotopes of a multi-isotopic element could result in an overweighting of this element in the UMAP embedding relative to mono-isotopic elements. In this study, a global normalisation was applied across all m/z, thereby preserving the relative signal intensities between isotopes. An overemphasising of multi-isotopic elements would only arise if normalisation were performed individually for each m/z (or isotope), but not under global normalisation. Compared to mono-isotopic elements, multi-isotopic elements are expected to receive a lower weighting in the UMAP embedding as their abundance and overall intensity are distributed across several isotopes. Selecting only a single isotope from a multi-isotopic element further amplifies this underweighting. The degree of underweighting also depends on the chosen distance metric (e.g., Euclidean distance or cosine similarity), which is a user-defined UMAP parameter. In this study, cosine similarity was selected as it results in less underemphasising of multi-isotopic elements compared to Euclidean distance. Summing the signal intensities from all isotopes of an element may appear to be a solution, but this approach introduces drawbacks in non-targeted data analysis due to isobaric and polyatomic interferences, as well as interferences from doubly charged ions. Therefore, in this study, all individual isotopes of an element were included for UMAP analysis. This strategy minimised underweighting while avoiding the need for prior knowledge of possible interferences. Furthermore, the inclusion of m/z values dominated by noise, which do not represent any relevant spatial distribution in the image, can be discussed. As these m/z values are usually low in signal intensity, they have only a minimal impact on the UMAP embedding. For the non-targeted UMAP approaches used in this study, all recorded m/z values were included, as this requires minimal prior knowledge of the dataset. However, as UMAP is based on user-defined inputs, future users may choose to include one or all of an element's recorded isotopes, to sum up multiple isotopes of elements while considering potential interferences or to filter noisy m/z before applying UMAP.
While UMAP offers clear advantages for the analysis of spectral imaging data, it is equally important to recognise the potential for misinterpretation of its results. As previously noted, the method is sensitive to hyper-parameter choices. Moreover, the axes of UMAP embeddings lack intrinsic meaning, and the distances between points do not correspond linearly to those in the original high-dimensional space due to inherent distortions introduced during dimensionality reduction. As a result, both the relative sizes of clusters and the distances between them should not be interpreted as quantitatively meaningful. In this light, UMAP should be understood primarily as an exploratory tool, effective for uncovering patterns and generating hypotheses, rather than as a definitive means of classification.
Nonetheless, UMAP remains a highly promising approach for advancing high-dimensional image analysis workflows across many diverse applications. Finally, it is worth emphasising that the workflows for applying UMAP demonstrated herein are not limited to LA-ICP-TOFMS data of biological specimens. Instead, they can be seamlessly employed in analogous ways for geological or other materials analysed with any type of high-dimensional imaging technique. Further studies could also focus on co-registering images from multimodal imaging approaches and using them together as input for the UMAP algorithm for improved image segmentation.
UMAP offers a versatile framework for both exploratory and hypothesis-driven analysis of high-dimensional LA-ICP-TOFMS imaging data. It supports the identification of spatial regions based on their distinct chemical composition, the discovery of subtle spectral variations, and the integration of complex multi-isotope information into a coherent analytical workflow.
| This journal is © The Royal Society of Chemistry 2025 |