Raven L. Buckman
Johnson
,
Vy T.
Tat
and
Young Jin
Lee
*
Department of Chemistry, Iowa State University, Ames, IA, USA. E-mail: yjlee@iastate.edu
First published on 29th August 2025
Mass spectrometry imaging (MSI) has emerged as a powerful tool for spatial metabolomics, but untargeted data analysis has proven to be challenging. When combined with in vivo isotope labeling (MSIi), MSI provides insights into metabolic dynamics with high spatial resolution; however, the data analysis becomes even more complex. Although various tools exist for advanced MSI analyses, machine learning (ML) applications to MSIi have not been explored. In this study, we leverage Cardinal to process MSIi datasets of duckweeds labeled with either 13CO2 or D2O. We apply spatial shrunken centroid (SSC) segmentation, an unsupervised ML algorithm, to differentiate metabolite localizations and investigate isotope labeling of untargeted metabolites. In the SSC segmentation of three-day 13C-labeled duckweed dataset, five spatial segments were identified based on distinct lipid isotopologue distributions, in contrast to classification of only three tissue regions in previous manual analysis based on galactolipid isotopologues. Similarly, SSC segmentation of five-day D-labeled dataset revealed five spatial segments based on distinct metabolite and isotopologue profiles. Further, this untargeted segmentation analysis of MSIi dataset provided insights on tissue-specific relative flux of each metabolite by calculating the fraction of de novo biosynthesis in each segment. Overall, the application of unsupervised machine learning to MSIi datasets has proven to significantly reduce analysis time, increase throughput, and improve the clarity of spatial isotopologue distributions.
Recently, MSIi was adopted to reveal spatiotemporal dynamics in lipid biosynthesis during plant development.14–16 Among those, MSIi of Lemna minor (duckweed) with deuterium (D)- and 13-carbon (13C)-labeling showed three distinct binomial isotopologue distributions corresponding to the partial labeling of each galactolipid building block.15 Interestingly, each isotopologue group demonstrated distinct localizations to the parent, intermediate, or daughter frond tissues, thus, revealing a spatial dependence in galactolipid biosynthesis.15 While this study successfully demonstrated MSIi, data analysis was a major bottleneck. Tedious data processing to manually identify isotopologues and subsequently investigate analyte spatial distributions ultimately limited throughput to targeted metabolites.15,17,20 MSI is considered high-dimensional data due to its large number of mass spectral and spatial features,21–23 thus, it becomes more cumbersome when isotope labeling is added. While several software tools exist for MSI data analysis with a wide range of capabilities, each provides benefits and drawbacks that make MSIi analysis challenging. For example, MSiReader24 incorporates percent isotope enrichment (PIE) that requires a manual region-of-interest selection to analyze targeted analytes.25–27 METASPACE,28 a widely used MSI platform, is not capable of analyzing isotopologues. Meanwhile, IsoScope,17 designed to work with MSIi data, is limited to targeted compounds. Therefore, additional tools are required to analyze isotope patterns imposed by in vivo isotope labeling in MSIi data and maintain high throughput.
Machine learning (ML) is a well-established and growing field of research that employs computer systems to adapt statistical models or algorithms without following explicit, user-defined instructions.21,29 Several recent review articles highlighted numerous examples of ML algorithms and platforms for streamlined MSI data processing.21,23,30,31 Additionally, a wide variety of ML applications and packages have been developed that include supervised,32,33 unsupervised,34,35 semi-supervised,36 or deep learning algorithms.37 Unsupervised learning techniques create models from uncategorized data and are allowed to explore patterns, associations, or structures in datasets without user intervention.21,29,31,38 These algorithms are particularly useful for large, complicated datasets that contain patterns which the user may, or may not, be aware of. In contrast, supervised and semi-supervised ML algorithms rely on a set of data, categorized by the user, to generate models, and are used to predict the outcome of another dataset.39,40 MSIi analysis could benefit from unsupervised machine learning to explore unexpected isotope labeling patterns in a rapid fashion; however, there have been no such applications so far.
Cardinal, an R-based statistical package developed to process and analyze MSI data, was first introduced by K. Bemis, et al. in 2015 and has quickly evolved into a robust platform for MSI analysis.41–43 The built-in spatial shrunken centroid (SSC) algorithm in Cardinal can be used for supervised classification or unsupervised segmentation.41–43 The unsupervised segmentation creates a subset of spectral features, uses probabilistic modeling and data reduction methods to compare subsets, then produces spatially segmented images.42,43 SSC segmentation provides a unique opportunity to enhance detection of spatially distributed isotopologue groups in MSIi datasets without the requirement of a priori identification of metabolites.42,43 In our previous study on MSIi of L. minor, we focused only on galactolipid biosynthesis and the corresponding spatiotemporal changes to isotopologue distributions due to the time-consuming manual analysis.15 By developing an analysis workflow to apply SSC segmentation to 13C- and D-labeled duckweed data in the current study, we improve throughput and reliability for a broader range of analytes in MSIi analysis.
Analysis of the unlabeled and labeled duckweed data was performed in R (ver. 4.4.1) with Cardinal (ver. 3.8.3).43 With Cardinal, .imzML files were read into memory and subsequently sum normalized (referred to as total ion count, TIC) and peak picked with a relative tolerance of 2 or 3 mDa. Detailed explanation of the fundamentals behind the SSC segmentation were provided in the original paper by K. Bemis et al.,42 and briefly summarized in SI 1. For all datasets, SSC segmentation was employed in a two-staged manner. In the first stage, images were partitioned into two segments (denoted as “tape” or “tissue”). The second stage, unsupervised segmentation on the tissue-only pixels was performed and the spatial distribution of isotopologues was evaluated. An example of the workflow followed in this study is available on GitHub (https://github.com/buckm065/IsotopeLabelingCardinalAnalysis). Computations were performed using 128 GB of RAM on Iowa State University's Nova cluster. Labeled MSI data from previously published work and used in this study is available through Figshare (https://doi.org/10.25380/iastate.28540523.v1), and unlabeled MSI data used in this study is available through METASPACE (https://metaspace2020.org/project/duckweed-machinelearning).
A subset of pixels was created from the “tissue” pixels observed in Fig. 1B, then SSC segmentation was performed a second time. Five models were generated with varying sparsity shrinkage parameters (s = 2, 4, 8, 16, and 32; see Fig. S1 and Table S1); in all, four segments were observed. It was determined that s = 8 provided reasonable performance metrics (see SI 2) and maintained enough features for later comparison to isotope-labeled metabolites. The four segments (Fig. 2A) were categorized based on differences in t-statistics and m/z co-localizations. Positive t-statistics indicated features with higher abundance than the global mean spectrum (Fig. 2B). In contrast, negative t-statistics indicated that feature intensities were lower, or possibly absent, compared to the global mean. For example, sucrose (m/z 381.080, [C12H22O11 + K]+; labeled with an * in Fig. 2B) showed higher abundance in the parent and intermediate/daughter segments (t = 14 and 12, respectively), and lower abundance in the tissue edge and budding pouch segments (t = −14 and −33, respectively); the MS image of sucrose (Fig. S2) supports this observed trend. In Fig. S3, additional examples of selected m/z values from the top five most positive t-statistics of each segment to demonstrate the most extreme differences in localizations and serve as indicators for the driving forces behind the segmentation results.
In the outer portion of the parent frond, sucrose (m/z 381.080, Fig. S2) and potassium sulfate (m/z 212.843, Fig. S3) exhibited more positive t-statistics than in the daughter/intermediate segment (Fig. 2). High abundance of sucrose in the parent frond was likely due to the short-term storage of excess glucose.45 Potassium sulfate originated from plant media, although it is not clear why it was more abundant in the outer parent region. In the budding pouch region of Fig. 2, the pocket from which new fronds develop, characteristic ions at m/z 483.069, 513.081, 603.112, and 633.123 were consistent with flavonoids, or other closely-related compounds, according to annotations provided by METASPACE. Galactolipids, major chloroplast lipids present in all photosynthetic cells of plants, such as MGDG 36:
6 (m/z 813.492) and DGDG 36
:
6 (m/z 975.550), were homogenously distributed throughout the duckweed tissue. The edge of the tissue was attributed to a sample preparation artifact, a ring of residue that remained after vacuum drying; an example microscope image of this residue is shown in Fig. S4. Therefore, SSC segmentation could effectively differentiate one or two pixel layers along the tissue border, which might not have been possible with manual ROI selection. A few characteristic ions in the tissue edge included choline ([C5H14NO]+ at m/z 104.108) and matrix ions (i.e. [2DHB–H + Na]+ and [3DHB–2H2O + K]+ at m/z 329.006 and 465.023, respectively). Two high abundance ions at m/z 553.370 and 569.434 were present in all parts of the tissue with minor intensity differences between segments; their chemical formulae were tentatively identified as [C30H58O4S + K]+ and [C35H62O3 + K]+, matching with well-known plastic additives.46,47 C30H58O4S and C35H62O3 likely originated from the tape, but preferentially formed potassium adducts in the tissue region while detected as sodium adducts in the off-tissue region at m/z 537.396 and 553.460 (Fig. 1C).
SSC segmentation was also performed without background removal to compare with the two-step segmentation results. To achieve similar segmentation results to that shown in Fig. 2A, k = 7 was required (Fig. S5). Four segments in the tissue region were similar to Fig. 2A; however, additional segments were observed in the off-tissue region. Due to SSC being sensitive to high abundance m/z features, such as matrix peaks or chemical background, the additional segments revealed were due to fluctuations in matrix intensity or matrix cluster peaks that were not relevant to the analysis of duckweed metabolites. Therefore, removal of the off-tissue regions could improve computational speed without impacting the overall segmentation results.
SSC segmentation was applied to MSIi data of 13C-labeled duckweed to test its effectiveness for the detection of various galactolipid isotopologue groups. First, the tape background (Fig. S6) was removed, then six SSC segmentation models (Table S2) were generated for the tissue-only region. Fig. 3 shows the segmentation of three-day 13C-labeled duckweed with r = 2, k = 5, and s = 8. Five distinct segments were observed: inner and outer parent frond, inner and outer daughter fronds, and budding pouch (Fig. 3A); no tissue edge was observed in this sample. This is in contrast to only three regions classified via manual analysis: parent, daughter, and intermediate.15 In Fig. 3B, the shrunken centroids of the MGDG 36:6 isotopologue distribution are shown as an example. As described in our previous work, the isotopologue distribution of MGDG 36:6 can be broken into four groups: unlabeled, and Groups 1, 2, and 3. The parent frond was transferred to the 13CO2-chamber at the beginning of the experiment; therefore, it was composed of unlabeled ‘old’ lipids. During the three-day incubation period, newly photosynthesized 13C6-galactose was transferred to unlabeled diacylglycerol (DAG) from UDP-13C6-galactose, producing Group 1 MGDG 36:6 in both the outer and inner parent segments (Fig. 3B). The inner parent and budding pouch exhibited greater abundance of Group 2 isotopologues; this region was previously referred to as ‘intermediate tissue’.15 Group 2 labeling was attributed to unlabeled lysophosphatidic acid (LPA), acylated by newly synthesized 13C-labeled fatty acid, then galactosylated by 13C6-galactose. It was hypothesized that, at the time of transfer to the 13CO2-chamber, the inner parent and budding pouch regions were still in the developing stage with highly abundant unlabeled LPA.15 Interestingly, there was a subtle difference in the Group 2 MGDG 36:6 isotopologue distributions between the inner parent and budding pouch. The Group 2 labeling in the inner parent and budding pouch segments peaked at 13C22 and 13C24, respectively (Fig. 3B). Unusually wide Group 2 distribution was previously noted compared to the expected binomial distribution fitting and ascribed to additional glycerol backbone labeling in Group 2.15 Here, SSC segmentation could reveal the minute differences in the spatial distributions of these intermediately labeled lipids.
![]() | ||
Fig. 3 (A) Spatial segmentation of three-day 13C-labeled duckweed sample. (B) Isolated isotopologue distribution of MGDG 36:6. (C) The t-statistics of the three-day labeled sample. |
In the daughter fronds, which were mainly composed of newly synthesized cells, Group 3 MGDG 36:6 isotopologues were most abundant and a notable lack of unlabeled or Group1 isotopologues was observed. While the difference between the outer and inner daughter segments was not distinguishable based on the isotopologues of MGDG 36:6, it was clearer in other lipids (Fig. 3C). For example, a lower abundance of Group 3 DGDG 36:6 was noted in the outer daughter compared to the inner daughter. Likewise, higher abundances of Group 3 pheophytin a and an unidentified lipid were observed in the outer daughter frond. These differences were also evident in the shrunken centroids, shown in Fig. S7. SSC segmentation of two-day 13C-labeled duckweed displayed similar results to those observed in the three-day sample. Details of the SSC segmentation model parameters are provided in Table S3. As shown in Fig. S8A, only four distinct segments were noted for the two-day labeled, corresponding to the outer parent, inner parent, daughter frond, and tissue edge. A simpler segmentation was expected due to the younger development stage of the daughter fronds than those shown in the three-day sample. Still, the four isotopologue groups of MGDG 36:6 (Fig. S8B) and DGDG 36:6 (Fig. S8C) were the main driving forces of the segmentation.
When interpreting segmented mass spectra (or shrunken centroids), one should carefully consider possible matrix effects, the change of ionization efficiencies in different parts of the tissues due differences in major species. In this particular application, matrix effects were not considered to be a major hinderance considering that all spatial segments are components of a leaf prepared by fracturing. These segments were all dominated by mesophyll cell membrane lipids, such as MGDGs, DGDGs, and chlorophylls. One may consider that the budding pouch might have a slight difference in matrix effect compared to parent or daughter fronds. However, the summed signal of the isotopologues of MGDG 36:6 in the budding pouch was not markedly different from other segments (Fig. 3B), suggesting there was minimal matrix effects, if any.
While the lipid mass range segmentation yielded few meaningful conclusions, the entire mass range (m/z 100–1200) provided more promising avenues to explore. The results of six SSC segmentation models are summarized in Table S4. Five segments were observed and annotated as the outer and inner parent, budding pouch, daughter fronds, and tissue edge (Fig. 4); the shrunken centroids and t-statistics for this model are provided in Fig. S10. The top five t-statistics from each segment (Table S5) were mostly from non-biological origins such as inorganic salts, matrix clusters, plastic additives, and polymer contaminations. Sucrose (m/z 381.080) in outer parent and asparagine (m/z 208.973) in the daughter fronds are two distinct metabolites with multiple isotopologues. Furthermore, after comparing the t-statistics of labeled in Fig. S10B to the unlabeled in Fig. 2B, another metabolite was noted with high abundance isotopologues and distinct localization: choline (m/z 104.108) localized to the budding pouch.
Fig. 5 shows the MS images of choline, asparagine, and sucrose and their respective isotopologue distributions. Choline and its isotopologues were most abundant in the budding pouch segment; unsurprisingly, the localization was similar in the unlabeled duckweed (not shown). Since the budding pouch is the pocket from which new fronds will emerge, higher abundance of choline could be due the synthesis of new membrane lipids (such as phosphatidylcholine) in this region to facilitate daughter frond maturation. Interestingly, asparagine demonstrated an unusually high abundance in deuterated tissues compared to unlabeled tissues (Fig. 5Avs. Fig. S11). When the total asparagine signal, including the monoisotope and isotopologues, were normalized to the signal at m/z 212.843, potassium sulfate from the media, it was only ∼1.9% in the unlabeled duckweed, whereas in the D-labeled tissue it was ∼194.1%. This higher abundance could indicate D2O-induced stress, which can inhibit the production of proteins and lead to a build up of amino acids.50 Sucrose was completely D-labeled and unlabeled sucrose was not observed after five days in 50% D2O (Fig. 5B), suggesting its high turnover rate. Its localization to the parent frond was attributed to the conversion of excess glucose to sucrose in matured cells. Several D-labeled metabolites, such as flavonoids, were not clearly visualized due to low abundances; however, their isotopologues were higher in abundance in the tissue edge (Fig. S12). A previous report showed that flavonoids were more abundant on the top exterior surfaces of duckweed than the interior middle layer.51 Hence, flavonoids were detected in higher abundance in the tissue edge segment where the residue from the top side of the duckweed remained on the tape after vacuum drying. MGDG 36:6 showed similar localization as previously discussed in 13C-labeling or D-labeling but Group 2 labeling was noted in very low abundance with no apparent localization in either the daughter or parent frond. MALDI-MS/MS was performed to confirm the proposed chemical formulae for select ions as shown in Fig. S13.
MGDG 36:6 | Pheophytin a | DGDG 34:3 | DGDG 36:6 | |
---|---|---|---|---|
Calculated from SSC segmentation results shown in Fig. 3 and Fig. S8, S9, respectively. ND: not detected. | ||||
Three-day 13 C | ||||
Outer parent | 7.5 | 1.3 | ND | 17.9 |
Inner parent | 70.2 | 58.1 | 13.1 | 70.7 |
Budding pouch | 73.9 | 71.1 | 6.0 | 67.5 |
Outer daughter | 99.0 | 99.9 | 100.0 | 99.0 |
Inner daughter | 99.3 | 99.9 | 100.0 | 99.3 |
Two-day 13 C | ||||
Outer parent | 6.0 | <1 | ND | 15.1 |
Inner parent | 36.8 | 11.3 | ND | 28.2 |
Daughter | 96.3 | 91.8 | 17.7 | 93.9 |
Tissue edge | 72.1 | 29.3 | ND | 56.3 |
Five-day D m/z 800–1200 | ||||
Parent | 2.0 | 7.4 | ND | 9.9 |
Daughter | 81.1 | 85.2 | ND | 27.7 |
Choline | Asparagine | Sucrose | C22H20O10 | C23H22O11 | C26H28O14 | C27H30O15 | MGDG 36:6 | Pheophytin a | |
---|---|---|---|---|---|---|---|---|---|
Calculated from SSC segmentation result shown in Fig. 4. ND: not detected. | |||||||||
Outer parent | ND | 88.6 | 100.0 | ND | 17.4 | 4.2 | 12.7 | 2.1 | 1.0 |
Inner parent | 14.8 | 84.8 | 100.0 | ND | 17.9 | 7.3 | 33.8 | 2.7 | 17.6 |
Budding pouch | 59.3 | 89.9 | 100.0 | 100 | 100.0 | 92.9 | 95.3 | 88.8 | 93.9 |
Daughter | 50.4 | 84.9 | 100.0 | ND | 100.0 | 100.0 | 100.0 | 97.4 | 98.2 |
Tissue edge | 48.4 | 86.9 | 100.0 | 48.5 | 58.1 | 33.5 | 48.4 | 29.3 | 26.1 |
While lipids exhibited a gradual increase of f values with tissue development, small metabolites showed a wider range of f values (Table 2). Flavonoids, generally, followed a similar trend as the lipids, gradually increasing toward the younger parts of the tissue. Asparagine and sucrose have consistent and high f values in all segments, lending to the conclusion that these metabolites are rapidly consumed and produced in all developmental stages. Unlike asparagine and sucrose, choline had the low f values less than 60%, especially in parent fronds. A lower f values for choline even in younger tissues compared to asparagine or sucrose was likely due to its complex synthetic pathways, whereas sucrose and asparagine are rapidly synthesized from glucose and its energy cycle in every cell types.
One caveat in this analysis is that the segments might be subject to different matrix effects, making quantitative interpretation unreliable. It is not expected to be a major limitation in this particular application to duckweed, as the fractured leaves are dominated by mesophyll cell membranes. Additionally, signal normalization was performed to account for pixel-to-pixel signal variations. However, some other tissue samples with quite distinct chemical compositions might have significant matrix effect among spatial segments. Even in such cases, relative isotopologue abundances of the same compound can be compared because the isotopologues would have the identical ionization efficiencies in the same segment. It is to be noted, as with all machine learning applications, the investigator should be critical of the results as not all reported t-statistics are analytically relevant or biologically pertinent. Improvements to this workflow could be made with improved signal intensities of isotopologues and reduction of chemical interferences, such as tape and matrix background contributions. Future work should focus on applying SSC segmentation to a variety of in vivo isotope labeling systems and improving the current method to automatically extract isotopologue distributions of each metabolite, cross-reference to existing metabolite databases, and generate a summary of results.
Code availability: An example of the code used in this analysis is publicly available on GitHub (https://github.com/buckm065/IsotopeLabelingCardinalAnalysis).
Supplementary information is available. See DOI: https://doi.org/10.1039/d5an00649j.
This journal is © The Royal Society of Chemistry 2025 |