Johannes
Gorges
and
Stefan
Grimme
*
Mulliken Center for Theoretical Chemistry, Clausius-Institute for Physical and Theoretical Chemistry, University of Bonn, Beringstr. 4, 53115 Bonn, Germany. E-mail: grimme@thch.uni-bonn.de
First published on 3rd March 2025
We present a new fully-automated computational workflow for the calculation of electron ionization mass spectra by automated reaction network discovery, transition state theory and Monte-Carlo simulations. Compared to its predecessor QCxMS [S. Grimme, Angew. Chem., Int. Ed., 52, 6306–6312] based on extensive molecular dynamics (MD) simulations, QCxMS2's more efficient approach of using stationary points on the potential energy surface (PES) enables the usage of accurate quantum chemical methods. Fragment geometries and reaction paths are optimized with fast semi-empirical quantum mechanical (SQM) methods and reaction barriers are refined at the density functional theory (DFT) level. This composite approach using GFN2-xTB geometries in combination with energies at the ωB97X-3c level proved to be an efficient combination. On a small but diverse test set of 16 organic and inorganic molecules, QCxMS2 spectra are more accurate than ones from QCxMS yielding on average a higher mass spectral matching of 0.700 compared to QCxMS with 0.622, and is more robust with a minimal matching of 0.498 versus 0.100. Further improvements were observed when both geometries and energies were computed at the ωB97X-3c level, yielding an average matching score of 0.730 and a minimal score of 0.527. Due to its higher accuracy and robustness while maintaining computational efficiency, we propose QCxMS2 as a complementary, more reliable and systematically improvable successor to QCxMS for elucidating fragmentation pathways and predicting electron ionization mass spectra of unknown chemical substances, e.g., in analytical chemistry applications. If coupled to currently developed improved SQM methods, QCxMS2 opens an efficient route to accurate, and routine mass spectra predictions. The QCxMS2 program suite is freely available on GitHub.
To this end, our group has developed some years ago the QCEIMS program for the automatic calculation of standard 70 eV electron ionization mass spectra. It is based on Born–Oppenheimer molecular dynamics (BO-MD) using efficient quantum mechanical (QM) methods to simulate the fragmentation processes of molecules.12 Due to the computational costs, the BO-MD simulations are mostly restricted to semi-empirical quantum mechanical (SQM) methods. The method was later extended to enable the simulation of electrospray ionization/collision-induced dissociation mass spectrometry (ESI/CID MS) and its name was changed to QCxMS (x = CID, EI) to account for the new functionality of the program.13 For the calculation of CID spectra, other quantum chemistry (QC)-based methods that use MD simulations, such as CIDMD14 and the VENUS program package,15,16 are also available. However, their accuracy has not yet been tested on a broad range of compounds, nor have they been applied to EI-MS.
QCxMS' good accuracy for a large variety of molecules was proven in several studies by our group17,18 and others.19,20 In several applications, it showed great success in elucidating unknown fragmentation pathways, e.g., for environmental pollutants21,22 or chemical warfare agents.23 However, for challenging molecules or if complicated fragmentation pathways are involved, in some cases, significant deviations from the experimentally measured spectra are observed with QCxMS. In a recent study on a large number of diverse organic environmentally relevant molecules, QCxMS spectra at the GFN1-xTB24 level were found to be too inaccurate for the application in spectral matching workflows. In particular, flexible molecules and molecules containing heteroatoms other than H, C, N, and O were found to be difficult for QCxMS.25 Additionally, a separate study found that the spectra of organic oxygen compounds are often inaccurate.20
We concluded that many failures can be attributed to two fundamental limitations of the approach of simulating the fragmentation by MD simulations:
1. To keep computationally feasible, the time scale of the computations (by default 5 ps for a single reaction trajectory) is orders of magnitude shorter than the real time scale of slower fragmentations, which may occur on the ns up to the μs timescale. Consequently, the corresponding peaks can be completely missing in the computed spectra.
2. Already for medium-sized molecules (30–50 atoms), the level of theory for the underlying potential energy surface (PES) in the MD simulations is limited to rather approximate SQM methods. The corresponding errors for reaction energies and barrier heights directly (and in exponentially weighted form) influence the computed reaction probabilities (spectral intensities). Reducing the errors due to the SQM methods by performing the MD simulations at a higher density functional theory (DFT) level is impossible with typical computational resources.
An alternative, completely different route to the BO-MD-based approach is Rice–Ramsperger–Kassel–Marcus26–28 quasi-equilibrium theory29 (RRKM/QET). In this approach, relative intensities are calculated from reaction rates derived from barrier heights in the reaction network and the resulting “master equations”. Drahos and Vkey expanded this theory to non-equilibrium situations and implemented it in the program “Mass Kinetics”.30 RRKM/QET was applied in several studies concerning EI or CID mass spectrometry.31–35 For more examples, we refer to ref. 36, where an overview of some important applications is given. However, these examples concern only small molecules, where a manual setup of all relevant reaction pathways is feasible. None of these approaches has been used routinely in a black-box type procedure for automated spectra prediction.
Here, we introduce a new program, QCxMS2, which enables the fully automated computation of mass spectra based on automated reaction network discovery. Herein, a forward open-end exploration approach37 is followed, which focuses exclusively on unimolecular reactions happening in MS experiments, in contrast to more general exploration software, such as Chemoton,38 Nanoreactor39 or AutoMekn2021.40 In QCxMS2, the well-established RRKM/QET approach is integrated with automated fragment/product generation and an efficient workflow utilizing QM methods to calculate reaction barriers. Previously well working parts in QCxMS like the assignment and treatment of fractional charge, the cascading reaction concept or the internal energy distribution model are kept.
This initial work focuses on the calculation of electron-ionization mass spectra (EI-MS) but the approach can be easily extended to CID. We begin by providing a brief overview of the theoretical background of the new approach. Next, we describe the implementation of the workflow and the computational details of the software. To assess the accuracy of the new QCxMS2 method, we apply it to a benchmark set of 16 organic and inorganic main-group molecules with diverse structural motifs and typical fragmentation patterns. We compare the resulting spectra to those computed by QCxMS, which, to the best of our knowledge, is the only comparable first-principles method for the QM-based calculation of EI-MS. After discussing computational timings, we present general conclusions on the accuracy and limitations of QCxMS2 and recommend potential use cases.
![]() | (1) |
In preliminary studies, we found the advanced treatments mentioned above are impractical to use in an automated workflow as the uncertainty for the depending variables caused by errors of the employed underlying QM method or the overall workflow led to too large errors. Therefore, we decided to employ the Eyring equation from conventional transition state theory50 as a more robust but less exact theoretical description of the reaction rates within the QCxMS2 workflow to avoid inconsistent treatment of the differently occurring reaction types in the generated reaction network. It reads
![]() | (2) |
![]() | (3) |
![]() | (4) |
For subsequent fragmentations, the internal energy is corrected by the energy loss of a fragment upon dissociation
E(fragment) = (E0 − KER − ΔH) × nat(frag)/nat(prec), | (5) |
Ion-tracking is conducted as in QCxMS.12 Molecular charges are distributed according to the ionization potential (IP) of the formed fragments, which are determined by self-consistent field (ΔSCF) calculations at a QM level (usually DFT). The statistical weight of each product is then given by
![]() | (6) |
![]() | (7) |
Some fundamental limitations of the QCxMS2 approach remain. Direct bond cleavage, also called non-statistical or nonergodic processes, i.e., reactions occurring at a rate faster than the intramolecular vibrational energy redistribution (IVR) cannot be accounted for. Although these are known to happen in a mass spectrometer56 they are assumed here to be less important for the computation of a (for typical applications) sufficiently accurate spectrum, and the assumptions of QET hold for most reactions occurring in a mass spectrometer.29 These reactions can be modeled through dynamical (MD)-based approaches, such as QCxMS, where atomic velocities are scaled non-uniformly to account for the period before the energy is fully equilibrated across the molecule.57 Quantum tunneling through reaction barriers58 may also occur but are also assumed to be less relevant, as they mostly happen for subsequent fragmentation on the ns to μs timescale.56 These effects are expected to cause the largest increase in rate constants for hydrogen dissociations. However, as discussed in Section 2, we tend to overestimate their rates. Therefore, theoretical models to describe this effect, such as those described in ref. 59, are not considered in QCxMS2, but can in principle be applied for critical cases in the future.
Electronically excited states may also affect the reaction barriers. A study using QCxMS reported improved spectra through the application of excited-state dynamics.60 We investigated this for the static approach of QCxMS2 by applying time-dependent DFT for the calculation of the reaction barriers in excited states but no improvement for the spectra was observed, as most excited states were found to be hardly populated at the assumed temperatures (see ESI,† S8 for details).
For a more thorough discussion of the mentioned and other less important physical effects, we refer to the excellent review of Dantus56 of the time-scales of different events in a mass spectrometer observed by time-resolved spectroscopy and Drahos' and Vékey's theoretical work on “Mass Kinetics”.30
QCxMS2 is an advanced script that integrates several external QM codes to fully automate the calculation of an electron ionization mass spectrum. The procedure follows a workflow consisting of seven main steps as shown in Fig. 1, which are detailed in the following sections. Additional technical details can be found in the open-source software code.
![]() | ||
Fig. 1 Schematic representation of the workflow of QCxMS2 for the computation of EI-MS. For details on the computational protocol see Section 3. |
In the fragment (product) generation step with CREST, harmonic repulsive potentials are applied for each atom pair separated by up to three covalent bonds, leading in geometry optimization with GFN2-xTB to typical fragmentation products.63,64 Additionally, further optimizations are conducted with attractive potentials between hydrogen atoms and potential protonation sites within a default cutoff distance of 4 Å to obtain often observed products due to hydrogen rearrangements. Note that these bias potentials are exclusively employed in the generation step and are not utilized in the subsequent energy and barrier calculations. Next, each obtained product is subsequently optimized in a maximum of 15 cycles without constraints to generate reasonable fragments on the GFN2-xTB PES while avoiding the recombination of the dissociation products. Both optimization steps are conducted at a high finite electronic temperature of 5000 K to favor the generation of open-shell (poly)radicals typically occurring in (EI-)MS. Duplicated structures produced are identified with MolBar65 and removed to avoid redundant calculations. Additional (random) shifting of atom positions can be employed to generate a greater number of potential products, however, this option is not activated in the default settings.
For more details on this structure generator, we refer to the original publication in ref. 62.
The fragment intensities are multiplied by their respective statistical charge computed earlier. Normalization of all computed fragment intensities to the intensity of the largest signal as usual results in the final spectrum.
Note that steps 5–7 are negligible in terms of computational costs compared to steps 1–4, which require QM calculations. This has the advantage that the normally unknown energy distribution can be adapted to the experiment and only any new reaction paths that may arise at higher energies need to be calculated. This is a further advantage over the use of MD trajectories, which have to be completely recalculated for different internal energies.
The natural isotope ratios are introduced in a post-simulation treatment as in QCxMS.12 Steps 1–7 are performed iteratively for each newly formed fragment with a relative intensity above a certain threshold, which is by default 1% of the initial intensity. Thus, subsequent fragmentations via cascade reactions are captured. For a more thorough discussion of the intensity threshold and the reproducibility of the workflow, see ESI,† Section S6.
Compound | GFN2-xTB | Composite | ωB97X-3c | QCxMS |
---|---|---|---|---|
n-Octane | 0.686 | 0.703 | 0.841 | 0.840 |
4-Methyl-1-pentene | 0.758 | 0.714 | 0.835 | 0.782 |
Ethyl propyl ether | 0.762 | 0.869 | 0.813 | 0.697 |
1-Butanol | 0.753 | 0.750 | 0.724 | 0.603 |
Butanal | 0.852 | 0.807 | 0.807 | 0.803 |
2-Pentanone | 0.781 | 0.718 | 0.818 | 0.743 |
Butanoic acid | 0.683 | 0.751 | 0.761 | 0.558 |
Methyl butyrate | 0.635 | 0.736 | 0.742 | 0.655 |
Butanamide | 0.494 | 0.673 | 0.674 | 0.620 |
Uracil | 0.644 | 0.498 | 0.659 | 0.769 |
Adenine | 0.748 | 0.790 | 0.712 | 0.794 |
Caffeine | 0.456 | 0.626 | 0.644 | 0.626 |
Tabun | 0.637 | 0.508 | 0.655 | 0.649 |
Tetramethylbi-phosphine disulfide | 0.691 | 0.796 | 0.782 | 0.269 |
Acibenzolar-S-methyl | 0.389 | 0.599 | 0.527 | 0.100 |
Dichloroethyl-aluminium | 0.752 | 0.667 | 0.679 | 0.438 |
Average | 0.670 | 0.700 | 0.730 | 0.622 |
Minimum | 0.389 | 0.599 | 0.527 | 0.100 |
To ease the assessment of the quality spectra, the spectral entropy matching score is used. It captures the presence of relevant peaks, as well as their relevant intensities compared to the experiment, and ranges from 0 (no agreement at all) to 1 (perfect agreement).85 It was recently shown85 that this score is more reliable than the commonly used cosine similarity score82,83 and it is in our opinion a good metric to evaluate the accuracy of the spectra in this work. For comparison, the average values of the cosine score are also given in the discussion below. Herein, a score of at least 0.75 between experimental spectra was found to be a meaningful threshold for reliable structure identification85 and should be aimed for with any theoretical procedure considering the uncertainty of the experiment. However, interpretation of this score is system-specific, e.g., the most important peaks for substance identification may be present despite a comparatively low score.
Spectra were computed with three different combinations of QM methods, given in the short notation “method used for reaction energies and barriers”//“method used for geometry optimizations and reaction path searches”, namely, GFN2-xTB//GFN2-xTB (“GFN2-xTB”), ωB97X-3c//GFN2-xTB (“composite”), and ωB97X-3c//ωB97X-3c (“ωB97X-3c”). IPs were calculated at the GFN2-xTB level and refined at the ωB97X-3c level as described above and for the DFT spectra only with ωB97X-3c calculated throughout. Harmonic vibrational frequencies were always computed with GFN2-xTB. The RSH ωB97X-3c was employed because it yields excellent barriers at low computational costs89 and is considered by us as one of the best yet still affordable methods in our context. For comparison, we computed spectra with QCxMS at the GFN2-xTB level, as the refinement of energies and performing MDs at the ωB97X-3c level is computationally not feasible (see Section 4.3).
On average, the highest level of theory employed, i.e., ωB97X-3c for geometries and energies, achieves a very good score of 0.73. Seven out of 16 compounds achieve the target accuracy of at least 0.75, while only four compounds, namely butanamide, uracil, tabun, and acibenzolar-S-methyl, exhibit a mediocre score below 0.7. As expected, using GFN2-xTB geometries instead of DFT geometries results in a slight decrease in accuracy with a still good average score of 0.7. When GFN2-xTB reaction barriers are used instead of ωB97X-3c, the accuracy drops to 0.67. The still good accuracy of GFN2-xTB is somewhat unexpected, considering its known limitations in accurately modeling radical cations and reaction barriers.63 Despite some outliers, for which GFN2-xTB or the composite level yields better results than ωB97X-3c presumably due to favorable error compensation, the trend is on average that a more accurate description of the PES leads to better spectra. This is an important observation and supports the underlying theoretical assumptions of the QCxMS2 approach.
For comparison, the commonly used cosine similarity score (see ESI,† S12 for scores for each compound), shows an even more pronounced trend with scores of 0.573 (GFN2-xTB//GFN2-xTB), 0.636 (ωB97X-3c//GFN2-xTB), and 0.711 (ωB97X-3c//ωB97X-3c).
The effect on the spectrum by refining the barriers at the ωB97X-3c level is exemplary shown in two examples. Fig. 2 depicts the spectra computed with QCxMS2 at the GFN2-xTB//GFN2-xTB and ωB97X-3c//GFN2-xTB levels for 2-pentanone and caffeine. For 2-pentanone, only very small differences between the spectra are visible and both show a good agreement with the experiment with matching scores of 0.781 and 0.718, respectively. Here, GFN2-xTB gives already a good description of the PES, and no refinement of the barriers is needed. The ωB97X-3c spectrum, shown in the ESI,† in Section S16, looks slightly better, as the peaks at m/z 57 and 29 are computed much smaller yielding excellent good matching score of 0.818. The spectrum of caffeine computed with GFN2-xTB agrees poorly with experiment. Although many relevant peaks are present, they have incorrect relative intensities which leads to a low matching score of only 0.456. By using ωB97X-3c reaction barriers, a substantial improvement to a score of 0.626 is obtained. However, the peak at m/z 109 has too low relative intensity, while the peaks at m/z 110 and m/z 111 are obtained with too high intensities. Additionally, the peak at m/z 55 is missing. Computing the spectrum using ωB97X-3c also for geometries further improves the agreement with experiment and yields a score of 0.644 (spectrum shown in the ESI,† in Section S16).
Next, we examine the effect on the spectra using ωB97X-3c geometries instead of GFN2-xTB geometries, as depicted in Fig. 3, with the examples of 4-methyl-1-pentene and uracil. The spectrum of 4-methyl-1-pentene computed with GFN2-xTB geometries generally shows good agreement with the experiment yielding a reasonable score of 0.714. However, several signal intensities are inaccurate, particularly the peaks at m/z 68, 57, and 53. When using ωB97X-3c geometries instead, the spectrum shows almost perfect agreement with the experiment with a matching score of 0.835, and the base peak is also correctly predicted to be at m/z 43. This suggests that the respective transition state geometries optimized at the GFN2-xTB level are insufficient for refinement at the ωB97X-3c level, and accurate relative barrier heights are only achieved when ωB97X-3c is also used for the geometry optimization.
An even more pronounced example for this observation is uracil. Here, the agreement with the experiment with GFN2-xTB optimized geometries is rather bad, as the peak m/z 84 is falsely predicted leading to a match score of only 0.498. This is due to a too “flat” PES of GFN2-xTB for the initial dissociation of CO leading to a wrong transition state structure too late at the reaction path and thus to an underestimated barrier for the peak at m/z 84. The apparently correct peak at m/z 41 stems from further dissociation of this fragment and is therefore predicted here only by chance but via a wrong pathway. As a result, the other correctly predicted peaks are consequently too low in intensity demonstrating the sensitivity of the approach, as one inaccurate barrier can potentially distort the whole spectrum. In contrast, the m/z 84 peak is virtually absent when using ωB97X-3c optimized reaction paths which gives the correct description for the CO dissociation leading to an overall much better agreement with the experimental spectrum (score of 0.659 versus 0.470). Overall, QCxMS2 demonstrates good accuracy, however, certain spectra exhibit low matching scores even at the ωB97X-3c level.
For example, in the ωB97X-3c spectrum of uracil, the signals at m/z 40, and 42 are missing and the peak at m/z 41 has a too low intensity as the competing fragmentation pathway to the peak at m/z 28 has a lower barrier. For m/z 42, we rationalized that the peak most likely stems from a hydrogen rearrangement of the fragment at m/z 69 to a ketene and subsequent loss of HCN. However, this fragmentation reaction is not predicted by the MSREACT fragment generator.
By modeling the reaction path of the H-rearrangement and the HCN loss manually we found a sufficiently low barrier for the fragment to be formed along the fragment at m/z 28.
The fragment with m/z 40 is also not formed by MSREACT and could not be identified manually. It may be not generated as it is very high in energy on the GFN2-xTB PES. The fact that the peak at m/z 40 is completely missing in the GFN2-xTB spectrum of QCxMS (shown in the ESI,† in Section S15) indicates that the fragment is not easily accessible on the GFN2-xTB PES.
To assess if MSREACT has in general problems in predicting the fragments of rigid ring systems, we computed additional spectra with ωB97X-3c//GFN2-xTB for the simplest representatives of this class, namely benzene and naphthalene and found also poor agreement with scores of 0.699 and 0.698, respectively. Notably, many peaks are missing in the computed spectra.
Using ωB97X-3c for geometry optimizations in QCxMS2 does not lead to better results for this class of compounds.
In addition, we computed the spectra at the ωB97X-3c//GFN2-xTB level with additional geometry optimizations after randomly shifting the atomic positions in the fragment generator. By applying these settings, more peaks observed in the experiment are correctly predicted but the overall accuracy of the spectrum does not increase as also more wrong intensities are predicted.
We conclude that the above described problems with conjugated π-ring systens is mainly due to the insufficient description of the PES of the formed fragments by GFN2-xTB. During the constrained optimizations in MSREACT, (unintended) atom rearrangements frequently occur, leading to the generation of numerous artificial structures. Refinement at the DFT level cannot resolve this issue, as the correct fragments are not generated at the GFN2-xTB stage. Employing a higher-level theory, such as ωB97X-3c, in MSREACT is computationally infeasible, as outlined in Section 3.1.
In contrast, QCxMS at the GFN2-xTB level produces reasonably accurate spectra, with scores of 0.895 and 0.826 for benzene and naphthalene. Here, the limited accuracy of GFN2-xTB does not appear to be as critical, as the presumed artifacts coincidentally align with the correct experimental masses. However, substituted benzenes and phenols also prove to be challenging for QCxMS.19,90
Another issue observed in the QCxMS2 spectra is that, for larger or more complex molecules, errors of ±1, m/z may occasionally occur. This indicates that a hydrogen atom is incorrectly assigned to the other fragment of the dissociation products in the respective fragmentation reaction compared to the experimental results. Such cases are observed, for instance, around the peak at m/z 109 for caffeine, as described above, or the missing peak at m/z 181 for acibenzolar-S-methyl (see below). Hydrogen dissociations are generally difficult to describe, as indicated by the scaling factor applied to the internal energy distribution for these reactions.
Another source of error are the ro-vibrational thermal contributions, which are computed only at the GFN2-xTB level, also due to computational costs. We investigated their effect for the spectrum of methyl-butyrate, for which we could achieve an improvement of the spectrum by using ωB97X-3c frequencies instead of GFN2-xTB frequencies (see ESI,† S10 for details).
Despite these problems mainly due to the limited accuracy of the (currently) feasible level of electronic structure theory, QCxMS2 shows overall good robustness, which is also reflected in the minimum score of 0.527 at the ωB97X-3c level for the test set. Taking into account that the set also contains complicated molecules with unusual fragmentation pathways, this is a good result.
Fig. 4 shows the computed spectra for ethyl propyl ether, butanoic acid, tetramethylbiphosphine disulfide, and acibenzolar-S-methyl using both QCxMS2 and QCxMS. For comparison, we take the best but still affordable level for QCxMS2, i.e., ωB97X-3c. Since computing spectra at this level is unfeasible in the QCxMS (see Section 4.3), we take here the spectra computed with the GFN2-xTB level of theory in the comparison. For ethyl propyl ether, the base peak at m/z 31 is significantly computed only by QCxMS2 and is virtually absent in the QCxMS spectrum, which fails to predict this rearrangement reaction from the fragment with m/z 59. This is a typical reaction occurring for ethers and an important finding that this signal is obtained with QCxMS2. Consequently, the matching score with QCxMS2 is much better (0.813 versus 0.697). A similar cases is butanoic acid, for which the Mclafferty type rearrangement resulting in the peak at m/z 60, which is also the main peak in the experimental spectrum, occurs with a too low probability with QCxMS. Also here, QCxMS2 computed this fragment in good agreement with the intensity from the experiment, yielding also a much better score of 0.761 versus 0.558. Even when using GFN2-xTB with QCxMS2, improved scores compared to QCxMS are achieved for ethyl propyl ether (0.762) and butanoic acid (0.683). These two examples demonstrate, that fragments stemming from rearrangements are underestimated in QCxMS, probably due to the limited MD simulation time.
Examples (c) and (d) in Fig. 4 contain the inorganic main group elements P and S, which were also found to be problematic for QCxMS in a recent study.25 The QCxMS computed spectrum for tetramethylbiphosphine disulfide shows poor agreement (score of 0.269), failing to predict the methyl dissociation to the peak at m/z 171 and the P–P bond breakage leading to the main peak at m/z 93 in the experimental spectrum. Instead, QCxMS predicts an additional H-shift associated with the P–P bond breakage, resulting in an experimentally unobserved peak at m/z 92. Thus, the issue of missing peaks by ±1 m/z unit, as described above for QCxMS2, also occurs with QCxMS. In contrast, QCxMS2 correctly predicts here the bond breakage and achieves a much higher score of 0.782. As the m/z 92 peak is also observed in the GFN2-xTB spectrum of QCxMS2 with a score of 0.691 (spectrum shown in the ESI,† in Section S16), the falsely predicted hydrogen shift in QCxMS is probably due to the MD approach and not due to the inaccuracy of GFN2-xTB.
For acibenzolar-S-methyl, the QCxMS spectrum shows almost no agreement to experiment, with a score of only 0.100. The peak at m/z 182, resulting from the loss of N2, is not found, and instead, a false peak at m/z 162 is computed as the main peak. This peak arises from an α-cleavage, involving an H-shift to the sulfur atom and dissociation of methanethiol at the carbonyl C-atom. Interestingly, this peak is observed in the GFN2-xTB spectrum of QCxMS2 (shown in the ESI,† in Section S16) (however, without hydrogen shift, i.e., resulting in a peak at m/z 163), indicating that the GFN2-xTB PES overestimates the stability of the thiadiazole ring.
Using QCxMS2 in conjunction with ωB97X-3c, the loss of N2 to the peak at m/z 182 is correctly computed. However, the main peak at m/z 181 is also missing here. Due to its high intensity in the experimental spectrum, this stems most probably not from a hydrogen dissociation from the fragment of m/z 182 and has to occur via a different mechanism, as the computed barrier of the hydrogen dissociation is much too high (even without scaling of the IEE applied) compared to the methyl loss to the fragment of m/z 167. The fragment generator does not produce the correct fragment here, probably due to the insufficient accuracy of GFN2-xTB as discussed in Section 4.1. However, apart from the main experimental peak, the relevant signals are obtained and the score of 0.527 is still reasonable compared to QCxMS. Overall, the results for the test set demonstrate that QCxMS2 exhibits improved accuracy and robustness in comparison to QCxMS.
Fig. 5 shows the computational timings scaled to 16 Intel Xeon “Sapphire Rapids” v4 @ 2.10 GHz CPU cores for the spectra calculation with QCxMS2 with the three different theory levels employed here and timings with QCxMS with GFN2-xTB and ωB97X-3c in comparison.
QCxMS can in principle be perfectly parallelized as every (cascading) trajectory is obtained separately, whereas with QCxMS2 the parallelization efficiency depends on the number of fragments in a fragmentation step and, how long particular calculations, e.g., a specific transition state search takes since some calculations have to be performed in a subsequent manner. The QCxMS2 calculations were performed with 16 CPU cores, with the exception of the expensive ωB97X-3c spectra calculations, which were performed on 96 cores for 2-pentanone and on an 128 AMD EPYC 7763 CPU for caffeine. The respective computational timings are scaled to 16 CPU cores. For the QCxMS calculations, the same number of cores was used as the number of trajectories. However, for a meaningful comparison in terms of the practical use of the program, the timings are scaled to the typical computational resources of 16 CPU cores. A QCxMS2 calculation for 2-pentanone at the GFN2-xTB level takes about an hour. Refining the barriers at the ωB97X-3c level takes only one hour more computation time. Computing the geometries and the reaction path search also at the ωB97X-3c level is very expensive and increases the computational costs massively to 502 hours. For caffeine, the computation time is as expected significantly larger as also more fragments have to be computed. Whereas in the calculation for 2-pentanone 79 isomers and 121 fragment pairs and hence 200 transition state searches and barrier calculations have to be performed, for caffeine 462 isomers and 292 fragment pairs were found, i.e., 754 reaction barriers have to be computed. However, the calculation is still feasible, requiring 3.7 hours for the GFN2-xTB calculation and 15.7 hours if the barriers are refined at the ωB97X-3c level. Computing the geometries at the ωB97X-3c level for caffeine, however, becomes impractically expensive, requiring about 4050 hours.
In comparison, the QCxMS calculation for 2-pentanone at the GFN2-xTB level takes 30 minutes using the default number of 400 trajectories for this molecular size. For caffeine, 600 trajectories have to be computed leading to an overall wall time of about one hour. However, refinement of the geometries at a higher level of theory as in QCxMS2 is not possible in this approach and to reach more accuracy all calculations need to be performed at the higher level of theory too. This is computationally very expensive, as demonstrated by the 2-pentanone calculation, which takes 7664.5 hours at the ωB97X-3c level, which is too expensive to be of practical use. Similarly, the corresponding calculation for caffeine is not feasible with our available resources and was therefore not performed. While QCxMS is computationally cheaper at the GFN2-xTB level, achieving better accuracy using a higher level of theory quickly becomes unfeasible. In contrast, refining barriers via DFT single-point calculations is possible with QCxMS2 and improves the accuracy (see Table 1). Even when using ωB97X-3c also for geometry optimizations, QCxMS2 remains computationally more efficient than QCxMS.
We attribute the remaining deviations from the experimental data primarily to errors in the methods used to calculate the electronic barriers, the vibrational contributions, and the possible structures appearing in the reaction networks. Due to the large size of these networks, we are limited to using efficient DFT methods for the energy calculations and SQM methods for the frequency calculations.
The CREST MSREACT fragment generator found in most cases all relevant peaks, and only in a few instances, particularly involving complex unsaturated ring rearrangements, missing fragments are suspected as a source of error. This issue is likely due to the limited accuracy of GFN2-xTB used in this step, and we anticipate that employing improved SQM methods will significantly reduce this error. We plan to investigate the issue of missing peaks in more detail in future studies.
For flexible structures, particularly those containing heavy main-group elements, QCxMS2 demonstrated excellent accuracy on average, significantly outperforming QCxMS. Additionally, typical rearrangement reactions of common organic functional groups are better captured by QCxMS2 than with QCxMS.
The QCxMS2 program is open-source and freely available.61 Note that all of the employed programs in the QCxMS2 workflow are open-source or at least free for academic use (ORCA) making QCxMS2 free to use for academia. Furthermore, QCxMS2 is systematically improvable and a more “controlled” approach than the MD-based QCxMS. We expect that QCxMS2 will benefit especially from newly developed QM methods for the computation of reaction pathways and barriers. Currently, efficient tight-binding methods are being developed in our lab and have already shown promising results close to the accuracy of DFT spectra at significantly reduced computation time. Initial tests with the unpublished g-xTB method currently developed in our lab employed for energies and geometries gave on average an excellent matching score of 0.736 close to the ωB97X-3c values at about the same computation time needed for the GFN2-xTB spectra.
Furthermore, an extension of QCxMS2 to describe negatively or multiply charged species, as well as to calculate the experimentally also very relevant ESI/CID-MS, is planned. Since the QCxMS2 approach can be systematically improved with more advanced methods, we view it as a promising pathway toward highly accurate and reliable mass spectrum predictions.
Footnote |
† Electronic supplementary information (ESI) available: [Geometries in xyz format for all structures, as well as the computed spectra for the test set, can be found here: https://github.com/grimme-lab/QCxMS2-data/]. Additional details on the implementation, tests of different technical parameters, and computed spectra, which are not in the manuscript are provided in the file SI.pdf. See DOI: https://doi.org/10.1039/d5cp00316d |
This journal is © the Owner Societies 2025 |