Comment on “ Causation or only correlation? Application of causal inference graphs for evaluating causality in nano-QSAR models ” by

In this comment we show that the accuracy of a recent nano-QSAR model for toxicity predictions of metal oxide nanoparticles towards bacteria E. coli can be greatly improved. On one hand, the experimental ionization energies of metal atoms could be substituted for the erroneous semi-empirically derived heat of formation values of metal ions as descriptors to construct a more reliable nano-QSAR model based on weighted linear least-squares ﬁ ttings. On the other hand, if no experimental data is available, a model relying on ionization energy descriptors from quantum chemical calculations could also be used producing exactly the same toxicity values as the experimental model. In a recent Leszczynski et al. have studied causality relations among toxicity data of metal oxide nanoparticles

In a recent paper, 1 Leszczynski et al. have studied causality relations among toxicity data of metal oxide nanoparticles (NPs) towards bacteria E. coli and several descriptors characterizing the whole chemical structures of metal oxide NPs and their atomic constituents (e.g., standard enthalpies of formation of gaseous metal ions, in the same oxidation states as in the oxides, charges, radii and polarization powers of the metal ions in question, etc.). For the computation, the PM6 semi-empirical quantum chemical method 2 was used. Of the twelve quantum chemical descriptors studied, however, the standard enthalpy of formation of gaseous metal ions (ΔH Me+ ) turned out to be the most reliable one. 3 Causality analysis revealed that there was a strong causality relation between these descriptors and the toxicity data of the metal oxide NPs. 1 Finally, they concluded that the enthalpy of formation of metal ions was the most relevant descriptor determining the toxicity of the metal oxide NPs.
Let us consider the ionization process of a gaseous metal atom: According to this process, the standard enthalpy of formation of the metal ion is as follows: where ΔH Me and IE are the standard enthalpy of formation of the gaseous metal atom and the cumulative ionization energy needed to remove the electrons from the neutral atom, respectively. Lots of ΔH Me and IE experimental data for atoms and atomic ions are available in the literature. 4 Table 1 contains all the data used in this study. The pMIC 50 toxicity, which is the negative decimal logarithm of the minimum inhibitory concentration, where 50% of the isolates are inhibited, and ΔH Me+ (PM6) data in column 6 of Table 1 were taken from the paper 1 of Leszczynski et al.
A careful analysis of their published data 1 reveals that 9 of the 17 PM6 standard enthalpies of formation are in error. In five cases -Co(II), Cr(III), Fe(III), Ni(II), V(III)the spin multiplicities assigned for the ground states of atomic ions differ from the real ones, and the standard enthalpies of formation obtained are too high. The largest deviation is more than 130 kcal mol −1 for Fe(III). For ions with the highest positive charges, Si(IV), Sn(IV), Ti(IV), Zr(IV), the computed standard enthalpies of formation are too low: the differences are more than 500 kcal mol −1 with respect to the experimental values. It is especially high for Si(IV): ∼800 kcal mol −1 . These errors warrant us to replace the PM6 standard enthalpies of formation in question with experimental or more sophisticated computed ones.
The quality of the experimental standard enthalpies of formation and ionization energies can be guessed with the help of our computed results. Table 1 also contains our computed standard enthalpies of formation and ionization energies for ions and atoms, respectively, by relativistic quantum chemistry (RQC). The def2-TZVPP and def2-QZVPP basis sets of Ahlrichs and associates, 5,6 and for bismuth the segmented all-electron relativistically contracted (SARC) basis set of Neese and coworkers 7 were utilized in the quantum chemical calculations. The Hartree-Fock energies were assumed to be converged with the def2-QZVPP basis set. However, the MP2, CCSD, CCSD(T), CCSDT, and CCSDT(Q) correlation energies were extrapolated to the complete basis set limit using the inverse cubic formula of Helgaker and coworkers 8 along with the def2-TZVPP and def2-QZVPP basis sets.
For third-and fourth-row elements scalar relativistic effects were described with the help of the second-order Douglas-Kroll-Hess (DKH) Hamiltonian. 9,10 In these calculations the so-called point charge nucleus model was utilized. For elements below the fourth row, scalar relativistic contributions were included by means of the Stuttgart-Dresden effective core potentials (ECP). 11,12 Nonrelativistic and ECP results were obtained with the MRCC suite of quantum chemical programs, 13 while DKH computations were performed using the ORCA package. 14 In the calculations all electrons were correlated, except, of course, the core electrons described by ECPs. We also repeated the PM6 calculations for all heat of formation values of metal ions with the proper settings. For these calculations the MOPAC2016 package 15 was used. Fig. 1 shows the correlation between the experimental and our corrected PM6-and RQC-computed ionization energies. The exceptionally good performance of the RQC over PM6 can clearly be seen. (For RQC the regression coefficient is 0.995 ± 0.002; Pearson's correlation coefficient (r) is 0.9999; and the standard deviation (s r ) is 7.90 kcal mol −1 . For PM6 the regression coefficient is 0.804 ± 0.067; Pearson's correlation coefficient (r) is 0.9373; and the standard deviation (s r ) is 177.07 kcal mol −1 .) The same is true for the computed standard enthalpies of formation. It is worth noting that due to fortuitous error cancellations, energy differences, e.g., ionization energies, can be more precisely obtained by quantum chemistry than discrete energy values, e.g., standard enthalpies of formation. That is why ionization energies are more favorable than enthalpies of formation as descriptors.
PRESS and R statistics 16 performed on the set of toxicity data in relation to the ionization energies revealed that two Table 1 Toxicity data for the metal oxide NPs and descriptor values for the constituent metal atoms and ions (heats of formation and ionization energies in kcal mol −1 )

Metal oxide
Me q+ pMIC 50  16 based on ordinary linear least-squares method (OLLS), with model function pMIC 50 = aIE + b, for the toxicity data using ionization energies (or standard enthalpies of formation) as descriptors. The first four lines of Table 2 show the statistics for these fittings. According to Table 2, all the models have almost equal goodness. It is known that the application of the OLLS method requires error-free independent variables and a dependent variable with uniform error. We can also perform, however, weighted linear least-squares (WLLS) fittings 16 with the following weights for the toxicity data: It can be seen that 0 < w i ≤ 1, and higher toxicity involves higher precision. The last two rows of Table 2 have been obtained via WLLS fittings using eqn (3) as weights. It is to be seen that the two models are equivalent in their performance.
Because, in several cases, the PM6 results carry considerable errors neither the OLLS nor the WLLS method can be used for linear regression.
Recently, Leszczynski et al. have also considered two metal oxide NPs which do not belong to the training set: 17 Mn 2 O 3 and WO 3 . Unfortunately, no experimental toxicity data are available for them. For the Mn atom, both experimental 4 (1308.50 kcal mol −1 ) and theoretical ionization energies (1302.76 kcal mol −1 , this study) are available. The reported predicted value for its toxicity is 2.84 based on PM6 standard enthalpy of formation. 17 The models corresponding to the last two rows of Table 2 supply a slightly lower value: 2.58. As for WO 3 , no experimental ionization energy is available for W(VI). Our computed value is 4639.17 kcal mol −1 . Considering the range of the ionization energies in Table 1, this is out of the domain of the training set. Since every model is only valid in its own IE range used to construct the model, in other words, only interpolation is allowed in this domain, therefore no reliable prediction for WO 3 can be made via the nano-QSAR models. The same holds for SiO 2 , where the IE of Si atom is slightly out of range considering that both the experimental and RQC-computed values are around 2380 kcal mol −1 . For V 2 O 3 , which was also an outlier, the experimental pMIC 50 is 3.14, while our computed value is 2.73 with both the RQC-and the experimental-IE-based models. Assuming that our nano-QSAR models yield accurate toxicity for every NP, where the IE of the constituent metal atom is in the range of the training set, the experimental pMIC 50 for V 2 O 3 is very likely to be in error, and repetition of this toxicity measurement is recommended.
In summary, we discussed that the recently published PM6-based heat of formation values used as descriptors for nano-QSAR models were in error for many transition metal ions, therefore toxicity predictions for metal oxide NPs based on them are not reliable. In this comment we revealed that by using the weighted linear least-squares method, it is possible to construct better nano-QSAR models based on either experimentally or quantum chemically derived atomic ionization energies instead of the ionic enthalpies of formation from semi-empirical calculations.

Conflicts of interest
There are no conflicts to declare.