Open Access Article
Lubomír
Prokeš
*a and
Lukáš
Pečinka
bc
aDepartment of Physics, Chemistry and Vocational Education, Faculty of Education, Masaryk University, Brno, Czech Re-public
bResearch Centre for Applied Molecular Oncology (RECAMO), Masaryk Memorial Cancer Institute, Brno, Czech Republic. E-mail: lukas.pecinka@mou.cz
cInternational Clinical Research Center, St Anne's University Hospital Brno, Czech Republic
First published on 15th October 2025
Mass spectrometric analysis of inorganic materials is widely used. However, no major advances have been made in this area compared to the significant progress in the analysis of biological materials. This work introduces a novel open-source R workflow that efficiently processes and models isotopic distributions in laser desorption ionisation time-of-flight mass spectrometry (LDI-TOF MS) analysis. Furthermore, it facilitates the comparison of modeled isotopic envelopes with experimental data and the selection of appropriate models that reveal the composition of complex experimental mass spectra. It overcomes the limitations of commercial software and opens new possibilities for the analysis of novel industrial materials.
This limits the application of routines, which drives efforts to develop universal tools for cross-platform mass spectra analysis. MALDI mass spectrometers are produced by various companies, with BRUKER as the market leader. Each company provides its evaluation software, which is often incompatible with data from other instruments. Researchers often turn to the R programming language to overcome these compatibility challenges and enable flexible data analysis regardless of the hardware used (https://cran.r-project.org/). Although R may offer lower computational efficiency compared to other programming languages, it is widely favoured for its simplicity and the wide array of available libraries for bioinformatics and MS data analysis.19,20 These include libraries for MS data, e.g. MALDIquant, MALDIquantForeign, MSnbase, CHNOSZ, EnviPat, InterpretMSSpectrum, and massSpecWavelet, and for biostatistics, e.g. caret, mixOmics, and FactoMineR.21–30 The large number of libraries enables researchers to concentrate on the scripts themselves, rather than on the construction of libraries. The objective of this technical note is to develop a comprehensive workflow to facilitate the identification of signals in mass spectra, focussing on clusters with complex isotopic distributions and overlapping patterns. This systematic workflow for LDI-TOF MS data analysis is implemented in the open-source R programming language, using libraries from CRAN and Bioconductor (https://www.bioconductor.org/), along with custom functions. Key functions essential for understanding the workflow are explained in this article, with a detailed description of the libraries and functions available in the Github repository.
The initial subset focusses on a comprehensive analysis of the mass spectrum, including preliminary processing steps such as transformations, baseline correction, and normalisation. To facilitate this, the mass spectrum is segmented using automatic clustering based on the signal-to-noise ratio (S/N) or predefined thresholds. Furthermore, the script integrates companion functions to detect isobaric contamination in mass spectra, a common issue in ICP-MS.32,33
In the second part, a detailed analysis of a specific region of the mass spectrum is performed. For the selected region, a detailed isotopic distribution analysis is performed. This includes calculating the theoretical stoichiometry of potential clusters, determining their relative abundances, and assessing the fit based on the similarity of the overall isotopic distribution. Furthermore, a comparison with the theoretical models is carried out by assessing the deviations between the m/z positions of the experimental and theoretical data. Additionally, the code includes a function to calculate the monoisotopic mass, which enhances the accuracy of isotopic characterisation.
The third section focuses on the analysis of spectra containing isotopically low abundance elements and monoisotopic elements, where regularly repeated series were observed, characterized by a gradual increase in the number of atoms of a specific element.
Both the source code and a tutorial workflow are available on GitHub: https://github.com/luboprok/LDI-TOF-MS.
In addition to these basic functions, there are several other functions that we developed:
(1) Mass spectra resampling: the reduction of data size and compatibility of mass spectra recorded with different numbers of points in a given range can be achieved through mass spectra resampling. This results in a reduction in computational memory requirements throughout the entire downstream process, thereby accelerating the overall process.
(2) Cluster stoichiometry: to identify signals in the mass spectrum, a function from the InterpretMSSpectrum library is utilised. Preliminary identification of molecular formulae associated with peak clusters is performed using the rcdk library.34 Additionally, the monoisotopic peak of the given signal is estimated along with the corresponding number of atoms for each element and the charge of the resulting ion. The tolerance range is also determined, which defines how much the theoretical monoisotopic signal may deviate from the set value. The mass spectra were automatically divided into distinct regions as shown in Fig. 1A. The mass spectrum within the selected region of interest 430–500 Da (Fig. 1B) was selected as example data, revealing a complex isotopic distribution. Different clusters identified in Fig. 1A can be easily analysed.
(3) Modeling of the mass spectrum: the workflow integrates the EnviPat library for the selection and consideration of specific clusters. A theoretical model was constructed within a defined domain using the selected resolution. The relative contributions of the individual signals were estimated from the theoretical isotopic envelopes corresponding to each ion. The calculated clusters were sorted according to the m/z difference and used for isotopic modeling. The isotopic distribution of the four most probable clusters is shown in Fig. 1C.
(4) Pattern superposition: individual isotopic distributions of theoretical signals are fitted to the measured data by superposition, and the percentage contributions estimated accordingly. The difference between theoretical and measured data was calculated and presented as a ‘mirror’ plot, as shown in Fig. 2A.
(5) Cumulative of the theoretical pattern: the cumulative theoretical distribution, which incorporates the percentage contributions of each signal, is computed and normalised to 100% based on the most dominant signal. Only negligible differences were observed in the contribution of individual isotopes, as shown in Fig. 2B and C (cumulative signals).
(6) Mass spectra alignment: the experimental and theoretical data are converted to mass spectra. The experimental data are recalibrated on the m/z scale using the warping function in the MALDIquant library.
(7) Fit isotopic pattern: fitting of the linear combination of individual theoretical cluster spectra into the experimental spectrum is performed using nonlinear regression with the Port algorithm (function nls in the basic stats library) followed by the calculation of the mixed isotopic pattern.32,35,36 The final fit (Fig. 2D) showed a strong match between the experimental data and the proposed cluster combination, with respective percentage contributions of Se6+ (100%), GaSe5+ (100%), Ga2Se4+ (40%), and Ga3Se3+ (20%).
(8) Peak identification and selection: peak detection is performed using the signal-to-noice ratio (SNR) ratio with the MALDIquant library or a threshold intensity value with the MALDIrppa library. A local maximum must be higher than the threshold to be recognised as a peak. The threshold can be estimated directly or calculated from the SNR, where the threshold is equal to the SNR × estimated noise.34 Examples of detected peaks in the region of 430–500 Da, according to a selected parameter, SNR, are presented in Fig. 3A and in 3B with the m/z values.31,34
(9) The Kendrick mass (KM) and the Kendrick mass defect (KMD): the remainders of the KM and the KM for fractional base units are calculated using exact mass calculation with the InterpretMSSpectrum library. Visualisation of the KM and the KMD is a concept used in high-resolution MS to simplify the analysis of complex mixtures.37–39 For the calculation of the KM, an element or molecular fragment (base unit) is set to an integer value (nominal mass) instead of its IUPAC mass. This adjustment creates a new mass scale. KMD is the difference between the nominal KM (an integer) and the exact KM. Compounds with the same KMD often belong to the same homologous series; clusters differing only in the number of base units, have the same KM defect but different nominal KM, and are positioned along a horizontal line on the plot. Horizontal lines of different KMD correspond to clusters of different compositions. In our workflow, the exact mass of a particular element can be chosen as the base unit (e.g. elements contained in clusters or oxygen if oxidation processes are studied). An example of the Kendrick plot (Fig. 3C) is shown for the Selenium atom. The colours correspond to the region of the mass spectra identified in Fig. 1A. The plot of the remaining KM on the y-axis of the Kendrick plot produces neat point alignments for all the series along the mass range.38 Two-dimensional mapping according to Artemenko et al., using the plot of normalised isotopic shift (NIS) vs. normalized monoisotopic mass defect (NMD), may also be applied for visualization of compositional differences between clusters, if the elemental composition of clusters is known.40
(10) Calculating average mass (library CHNOSZ), exact mass (library InterpretMSSpectrum), and monoisotopic mass (function utilising a list of isotopes from the enviPat library) and 2D mapping of clusters according to Artemenko et al.40 Modeled mass spectra of selected chemical formulae and given resolutions also served as a tool to detect isobaric overlaps. For example, the superposition of the Pt+ and HfO+ model mass spectra was crucial to study the spectral interferences in ICP-MS (Fig. 3D).
Outputs not directly presented in the article are available on GitHub.
In contrast to biomolecules, there is a scarcity of freely available software for inorganic substances that would allow us to process and interpret the acquired data together with the identification of the substances in the spectra. As far as we know, there are only isotope distribution calculators, e.g. enviPat (https://www.envipat.eawag.ch/), Isotope Distribution Calculator and Mass Spec Plotter (https://www.sisweb.com/mstools/isotope.htm), and ChemCalc (https://www.chemcalc.org/). However, these calculators are limited to the computation of a single isotopic distribution and do not support the simulation of overlapping distributions within a mass spectrum. This limitation can be addressed using the commercially available Launchpad software from Kratos Analytical Ltd, which allows for the calculation of up to five isotopic distributions and their superposition by summing the individual intensity profiles. However, this software, along with web-based calculators, lacks the advanced capabilities required for tasks such as spectral alignment, superposition of multiple isotopic distributions, or integration into automated workflows and shows insufficient compatibility with modern operating systems.
However, inorganic substance analysis is of great importance for mineralogy, materials science, and solid speciation analysis, as well as in the fields of forensic science, laser-generated clusters, science of arts, and mass spectrometry education.1–16 LDI-TOF MS was applied in compositional and structural characterization of powders and thin films of various chalcogenide glasses, useful optic materials applied in infrared spectroscopy, sensors and thermal imaging.1,5,7
The technique has also been applied for the identification and characterisation of As- and Cu-sulfide minerals, as well as inorganic components in ancient and modern cosmetics, including PbCO3, BiOCl, ferrocyanide, and bentonite.11,14,16,46 Furthermore, LDI-TOF MS has been successfully used to identify inorganic pigments such as gold (Au), vermilion (HgS), orpiment (As2S3), copper-based pigments including verdigris (Cu(CH3COO)2·[Cu(OH)2]3·2H2O), malachite (CuCO3·Cu(OH)2), and emerald green (3Cu(AsO2)2·Cu(CH3COO)2), as well as Prussian blue (Fe4[Fe(CN)6]3) and lead chromate (PbCrO4). These pigments were identified in historical manuscripts, paintings, coatings, and paints.12,13,15 The classical application of LDI-TOF is the characterisation of oxidation and corrosion products of Cu–Ag–Zn alloys, as well as other materials, including thin films of black phosphorus, arsenic, and titanium carbide thin films.5,6,47–50 Additionally, the technique allows the generation of clusters via laser ablation synthesis directly within the mass spectrometer, making it highly valuable for studying various elements, compounds, and nanomaterials for scientific and educational applications.2–4,9,11,51
In addition, the user-modifiable workflow, accompanied by detailed descriptions, allows for easy parameter adjustment and the integration of new applications by the user. This would be difficult or impractical to achieve through a user interface in Shiny (R) or a Python graphical interface. Also, creating a new library would limit the ability to modify functions and set their parameters for beginner users. For these reasons, the development of an R-based workflow with comprehensive documentation appears to be a viable solution for all types of users.
| This journal is © The Royal Society of Chemistry 2025 |