Quantum chemical calculation of electron ionization mass spectra for general organic and inorganic molecules

The implementation of a novel tight-binding Hamiltonian within the QCEIMS program allows the first-principles based computation of EI mass spectra within a few hours for systems containing elements up to Z = 86.

1 Exemplary reaction coordinates for decomposition pathways For the prediction of EI-MS by QCEIMS, the quality of the resulting spectra is reflected in the accuracy of the QC method used to compute the atomic forces. In other words, the PES of the QC method has to closely parallel the 'true' PES. Therefore, we compare potential energy curves obtained with GFN-xTB to its level of reference, hybrid DFT. Three simple exemplary reaction pathways involving single bond ruptures are examined: the loss of an ethyl residue from the hexane cation (see Fig. 1), AsCl + 2 from the lewisite cation (see Fig. 2) and iodine from iodobenzene cation (see Fig. 3). Snapshots along the reaction pathway have been superimposed on the figures. The three chosen pathways correspond to the formation of ions which were observed to have relatively intense peak-signals. Therefore, we consider the potential energy curves to be representative of the MD trajectories.
To compute the reaction pathways, we use a simple and intuitive approach, referred to here as a relaxed potential energy surface scan. Given an optimized reactant configuration (the equilibrium ion structure) and products (a neutral and a charged fragment). We perform a linear interpolation of the system with 30 system images placed between the reactant and product states. In the interpolation, only one degree of freedom is varied in an equidistant stepwise fashion, which corresponds to the dissociation process. Each image is then optimized with the dissociating bond distance constrained and all other degrees of freedom are allowed to relax. The optimization is performed using PBE0-D3(bj)/def2-TZVP with an electronic temperature of 10000 K. The energy of each optimized image (including the reactant and product configurations) is calculated with GFN-xTB (at 5000 K) and refined by PBE0-D3(bj)/def2-QZVP (at 10000 K). This methodology for computation of reaction pathways is known to fail for more complex reactions (i.e. reaction coordinates) than the ones presented here. This can be seen by an introduction of discontinuity in the potential energy curve, where relaxation of the remaining degrees of freedom pulls the system away from the minimum energy path. The blue points were calculated using GFN-xTB (5000 K) and the red points by PBE0-D3(bj)/def2-QZVP (10000 K). For clarity, three snapshot along the reaction coordinate have been superimposed on the figure.
Figure 2: Potential energy curve for the loss of AsCl + 2 from the lewissite cation. The blue points were calculated using GFN-xTB (5000 K) and the red points by PBE0-D3(bj)/def2-QZVP (10000 K). For clarity, three snapshot along the reaction coordinate have been superimposed on the figure.
Figure 3: Potential energy curve for the loss of an iodine from the iodobenzen cation. The blue points were calculated using GFN-xTB (5000 K) and the red points by PBE0-D3(bj)/def2-QZVP (10000 K). For clarity, three snapshot along the reaction coordinate have been superimposed on the figure.
The agreement between the shape of the potential energy curves calculated with PBE0-D3(bj)/def2-QZVP and GFN-xTB is excellent, for all three reactions. However, we find the GFN-xTB to predict too strong binding, where the difference can range from roughly 0.25 (lewissite) to 1 eV (hexane).
The three cases shown here are only an initial assessment of GFN-xTB. It is nowhere near complete and a more extensive study is needed e.g., by inclusion of a large number of representative 'real-world' systems and reactions with more complicated reaction coordinates and comparison to high-level ab-initio QC calculations and hybrid DFT.

Additional Spectra
All additional EI mass spectra are calculated with the same simulation parameters as in the main manuscript, except for the specific modifications, which are investigated in the first subsection.

Effect of Simulation Time and IEE Distribution
We have investigated the effect of two important simulation parameters: (i) the maximum simulation time parameter, and (ii) the IEE distribution. The former determines one stop criterion in the QCEIMS production runs. It has been set to 10 ps for the results reported in the main manuscript. The latter determines the amount of internal energy deposed in each production run. It is set by default to have its maximum at 0.6 eV per atom. For details, see Angew. Chem. Int. Ed., 2013, 52, 6306.
We have scanned these two simulation parameters in the following way : (i) the maximum simulation time is set to 5 ps, 10 ps, and 20 ps, respectively. (ii) The IEE distribution has been set to 0.6 eV per atom (the default value), and 0.3 eV per atom. The results are presented in Figures 4, and 5, respectively. This procedure was performed for the molecules 1-fluorohexane (2) and tetramethylsilane (13).
The results reveal that the simulation results are perhaps unexpectedly quite robust with respect to the choice of the two parameters. There are, of course, minor differences in the calculated EI-MS of the two compounds, but these are not visible in Figures 4 and 5, but are recorded in the respective output files. Since the purpose of QCEIMS is not to obtain a quantitatively accurate prediction of an EI-MS but rather to obtain a computed spectrum by which a compound may be identified and its unimolecular fragmentation pathways upon electron ionization explored, the finding that the variation of simulation parameters may not change the results significantly adds to our conclusion that QCEIMS is a stable and reliable program. The systematic exploration of much longer simulation times of 100 ps to a full nanosecond will be the subject of further research, which is beyond the scope of the present study.

Additional Calculated Spectra of Organic Molecules
We show additional calculated spectra of organic molecules below.

IP Evaluation
As seen in Figure 12, the calculated spectrum shows a lot of artifacts that are due to the erroneous evaluation of the fragment ionization potentials, for which no specialized IPEA-xTB parameters exist. In contrast to the spectrum shown in the paper, the naked Fe + is not predicted correctly. For this reason, we recommend that the computation of ionization potential remain at the ∆ SCF (PBE0/SV(P)) level of theory until the parametrization of IPEA-xTB will have been completed. This is also valid for the spectra presented in the next subsection.

Additional Calculated Spectra -Deficiencies of MS(GFN-xTB)
Here, we present a number of calculated EI-MS, which we do not consider of sufficient quality, and offer preliminary statements on how these failures may be explained. It should be said before all discussion below that GFN-xTB is a semiempirical, cost-efficient QC method, which cannot be expected to always yield a perfect description of the energetics of the unimolecular fragmentation reaction space.
The apparent failure of EI-MS prediction for several classes of biomolecules by MS(GFN-xTB) (the dipeptides dialanine and cystine, sucrose and tripalmitin, shown in Figure 14) could be considered a distressing finding. However, closer inspection of the simulation results reveals that some of the failures can be explained reasonably. The computed spectrum of dialanine, for instance, consists mostly of the base peak, which is an ion produced in a standard α cleavage channel. It is not unreasonable that the GFN-xTB PES should overrepresent this pathway by perhaps featuring a too low barrier for this reaction. Similar observations are made for sucrose and the triglyceride tripalmitin. For cystine, there are admittedly many artefacts, which, however, disappear when computing the IPs at the ∆ SCF (PBE0/SV(P)) level, see Figure 15. That spectrum has been calculated using 200 production runs. The final IP/EA xTB parametrization has not been performed for sulfur yet, and in sensitive cases, we recommend crosschecking the IP evaluation by switching on the ∆ SCF (PBE0/SV(P)) level for that part of the simulation. The computed EI-MS of saframycin A shown in Figure 16 contains a lot of artefacts.
Despite the ability of MS(GFN-xTB) to capture some of the main peaks, there are obviously some fragmentation pathways that are artificially overrepresented. Moreover, the internal energy distribution leads in this case to both heavy fragmentation in the production runs as well as survival of the molecular, which is not seen in the experiment. Therefore, the energy distribution may be unbalanced. It will be the topic of further research to investigate why our internal energy distribution model succeeds in many cases but fails in others. Similar observations are made for tecloftalam ( Figure 17).

Additional Calculated Spectra -Comparison of Semi-empirical PES
In this subsection, we show three illustrative examples of the effect of the semi-empirical quantum chemical PES, which we hold is the largest error source for the computed spectra.
The score that is given below is a modified dot-product score. It quantifies the overlap between the computed and experimental spectra. A score of 0 means no overlap, a score of 1,000 means identical spectra.
For the case of methyl sulfonamide, we compare between the DFTB3-D3 and GFN-xTB computed spectra, see Figure 18. For DFTB3-D3, no molecular ion survives the simulation, and the base peak is not identified correctly, indicating that the dissociation energies of the S-N and S-C bonds are not in the right order. The intensity of the peaks in the GFN-xTB computed spectrum is of much higher quality, even if the base peak is misassigned (m/z 15 is the methyl cation, possibly a problem of the IP calculations, as addressed for cystine above). The higher PES quality of GFN-xTB for methyl sulfonamide leads to a much higher score for the comparison between computation and experiment.  computed spectrum is only due to the isotope peak, which has been added post-simulation.
This is another case where the PES quality is the main source of discrepancies between the simulation and the experiment. The McLafferty rearrangement pathway is accessible on the GFN-xTB PES. It appears to be inaccessible on the DFTB3-D3 PES, at least using our standard simulation conditions. Moreover, the base peak, m/z 43, is correctly predicted at the GFN-xTB level of theory whereas the base peak in the DFTB3-D3 computed spectrum is the ion m/z 57. This indicates that the GFN-xTB PES is of a higher quality for 2-hexanone compared to DFTB3-D3, which is also reflected in the spectral matching score difference.  Lastly, we show a comparison between PM6-D2H and GFN-xTB calculated spectra of ferrocene in Figure 21. The discussion here focuses on the ion m/z 105, FeC 4 H + , which is found in traces in the experimental spectrum. As displayed in Figure 21, this ion has a chemically unreasonable structure, which is due to the short-range deficiencies of the PM6-D2H Hamiltonian (in essence, there is no Pauli repulsion), manifesting itself in the artificially short Fe-C bond length of 1.05Å. The FeC 4 H + ion also appears in the GFN-xTB calculated spectrum, although only as the results of one production run. Its structure is much more reasonable with a Fe-C distance of 1.97Å. We therefore argue that GFN-xTB may produce artifacts, but they are to the best of our knowledge 'reasonable' artifacts, e.g., due to a wrong ordering of reaction channels on the PES. We have not observed any completely unphysical structures of our simulated fragment ions.

Computational statistics
The average computational time required for a single point energy/gradient computation (QC call) and the number of unsuccessful production runs is depicted in Figure 3 (in the paper) for each molecule, with the exclusion of nickel(II)bis(diphenyl-acetylacetonate). This average computational time is calculated by the ratio of the total wall-time and the total number of QC calls, over all 1000 production runs. Moreover, for further transparency we report here (see Table 1) the maximum and average number of QC calls per production run as well as the average and maximum computational time. The data show the large spread of computation times in the production runs depending on the fragmentation events and the stop criteria. The maximum computational time for a production run is often reached when the molecular ion survives while the maximum of QC calls is often related to production runs with many cascading trajectories, which are not necessarily more expensive due to the neutral losses being discounted. Table 1 also reflects that the IP calculation by DFT (as was done for the organometallic systems 7-10 significantly increases the computational times. Table 1: Average number of energy/gradient computations (QC calls) in a production run along with the standard deviation and the maximum number of QC calls, for the given molecules 1-23. Furthermore, we show the average (along with standard deviation) and maximum computational time per production run, as well. To obtain an estimate of the total wall time, multiply the average t comp by 1,000 and divide by the number of available cores (which has been, in our case 1,000).