Reenu and
Vikas*
Quantum Chemistry Group, Department of Chemistry & Centre of Advanced Studies in Chemistry, Panjab University, Chandigarh-160014, India. E-mail: qlabspu@pu.ac.in; qlabspu@yahoo.com; Tel: +91-172-2534408
First published on 16th March 2015
Quantum-mechanical exchange and correlation interactions between electrons are quite crucial in deciding molecular geometry and properties. Such electronic interactions can have a significant role in the reliability of a quantitative structure–activity relationship (QSAR) because the biological activities of the chemicals can be described as a function of the molecular structure through the QSARs which are routinely based on the quantum-mechanical molecular descriptors. In this work, we present a detailed analysis of the effect of the quantum-mechanical exchange and correlation on the internal stability and external predictivity of a QSAR model based on the quantum-mechanical molecular descriptors while modeling the mutagenic activity of a set of 51 nitrated-polycyclic aromatic hydrocarbons (PAHs). For this, various molecular descriptors are computed using electronic structure methods such as the Hartree–Fock (HF) method, and density functional theory (DFT) employing only the exchange functionals (HFX, B88), pure exchange and correlation functionals (HFX + LYP, BLYP), hybrid (B3LYP), meta (M06-L), and meta-hybrid (M06, M06-2X) exchange–correlation (XC) functionals. To further analyze the role of electron-correlation, QSAR models are also developed using the descriptors incorporating mainly the effect of electron-correlation. The external predictivity of the developed models is assessed through state-of-the-art external validation parameters employing an external prediction set of compounds. A comparison of the quality of the models developed with the descriptors computed using different electronic structure methods revealed that the exchange interactions are quite critical along with the electron-correlation in modeling the mutagenicity. Notably, for most of the models, electron-correlation based descriptors are found to be highly reliable when computed using the hybrid XC functionals, particularly B3LYP and M06-2X.
The HF method incorporates exchange interactions exactly, though neglecting a significant part of the dynamic electron correlations,7,8 while the DFT accounts for the exchange–correlation (XC) through an XC density functional, the exact form of which is yet unknown. The most widely used DFT XC functional, B3LYP, was introduced by Becke12 and Lee–Yang–Parr13 which is a hybrid of a local and non-local exchange and correlation. Enormous efforts have been made recently and in the past decade to find out proper XC density functionals which should yield accurate energies, geometries and thermo-chemical properties for not only the covalent systems but also for the non-covalently interacting systems.14–16 Literature analysis shows that the many widely applicable functionals, using the generalized-gradient approximation (GGA) based on the gradient of the electron density, are separately accurate for molecules,17,18 solids,19 interfaces,20 and even low-dimensional systems,21 but no GGA functional is simultaneously accurate for all of these systems.22 On the other hand, recently available meta-GGA functionals, which include the dependence on the spin kinetic energy density, overcome many of the limitations of GGA functionals with almost same computational cost.23,24 However, the reliability of hybrid GGA functionals in describing the energies can be sensitive to the amount of exact HF exchange embraced in the functional because the quantum-mechanical exchange interactions are necessary for an accurate description of an atomic or a molecular structure.25 The performance of various density functionals, with and without exchange incorporation, has previously been analyzed in detail by the different studies.25–27
The present work compares the performance of exchange and correlation contributions of a few relevant and widely used pure, hybrid, meta, and meta-hybrid GGA functionals as well as an exact HF exchange towards the real external predictivity of the QSARs based on quantum-mechanical molecular descriptors, namely, the total energy of a molecule (E), energy of the highest occupied and lowest unoccupied molecular orbital (EHOMO and ELUMO), absolute electronegativity (χ), chemical hardness (η), and electrophilicity index (ω).28–30 For the present study, QSAR models are developed and analyzed for a couple of biological activities, namely, the base-pair and frame-shift mutation activity of 51 nitrated-PAHs. The XC functionals of the DFT analyzed in the present study are widely used for the QSAR modeling, as discussed in the next section.
DCORR = DDFT − DHF/X-only | (1) |
DCORR(HFX+LYP) = DHFX+LYP − DHFX, | (2) |
DCORR(BLYP) = DBLYP − DB88. | (3) |
S. no. | Method | Exchange | Correlation | ||
---|---|---|---|---|---|
Electron-density gradients | LSDa/Slater | HF (%) | |||
a Local spin density.b Perdew–Burke–Ernzerhof.c Voorhis and Scuseria. | |||||
X-only methods | |||||
1 | HFX | — | — | 100% | No |
2 | B88 | B88 | Yes | 0% | No |
X + C methods | |||||
3 | HFX + LYP | — | — | 100% | From LYP functional (ref. 12 and 13) |
4 | BLYP | B88 | Yes | 0% | From LYP functional (ref. 12 and 13) |
5 | B3LYP | B88 | Yes | 20% | From B3LYP functional (ref. 12 and 13) |
6 | M06-L | PBEb and spin kinetic energy density | Yes | 0% | M05 correlation functional augmented by VSc terms, treating opposite- and parallel-spin correlations differently (ref. 34 and references therein) |
7 | M06 | As in M06-L | 27% | As in M06-L (ref. 35) | |
8 | M06-2X | As in M06-L | 54% | As in M06-L (ref. 35) | |
![]() |
|||||
CORR only methods | |||||
9 | CORR (HFX + LYP) | As compensated in HFX + LYP through eqn (2) | From LYP functional | ||
10 | CORR (BLYP) | As compensated in BLYP through eqn (3) | From LYP functional | ||
11 | CORR (B3LYP) | As compensated in B3LYP through eqn (1) | From B3LYP functional | ||
12 | CORR (M06) | As compensated in M06 through eqn (1) | From M06 functional | ||
13 | CORR (M06-L) | As compensated in M06-L through eqn (1) | From M06-L functional | ||
14 | CORR (M06-2X) | As compensated in M06-2X through eqn (1) | From M06-2X functional |
It should further be noted that the descriptors, DCORR, computed using different X-only and X + C methods differs not merely due to the exchange and/or correlation contribution but also due to the differing effects of the XC functionals which in fact result in different electron densities for the same molecular structure. The mathematical difference in the eqn (1)–(3) for DCORR, computed using an XC functional, for example BLYP, actually represents the effect of the descriptor arising from the difference between the electron-densities obtained using the BLYP X + C functional and B88 X-only functional.
Furthermore, the estimation of electron-correlation in the molecular descriptors, particularly for the biologically relevant compounds, at the advanced ab-initio theories such as configuration interaction method37 and coupled cluster theory7 demands huge computational resources and time. The aforementioned strategy employed in this study and in our previous works3–6,36 provides a computationally less expensive, though only an approximate, method to compare the role of quantum-mechanical exchange and correlation in the descriptors employed for developing the externally predictive quantitative models. Through these, the effect of variation of the exact HF exchange in the predictivity of QSARs can also be analyzed. For example, M06 and M06-2X incorporates 27% and 54% of HF exchange, respectively, whereas M06-L does not incorporate exact HF exchange as also illustrated in Table 1.
For the computation of various descriptors, the geometry of each of the 51 nitrated-PAHs, listed in the ESI Tables S1 and S2a–f,† is optimized at the HF and DFT level of the theory employing a 6-311G(d,p) Gaussian basis set, which was followed by the harmonic frequency analysis to ensure that the optimized geometry corresponds to a true global minimum. It should be noted that the quantitative models discussed in the present study are based on the molecular descriptors computed using the same basis set i.e., 6-311G(d,p) for various quantum-mechanical computations employing different electronic structure methods, though for a few models, we had also analyzed the role of polarization and diffuse function with the computations performed using 6-311G and 6-311++G(d,p) basis sets. It should further be noted that the exclusion of the polarization functions (d,p) not only leads to different numerical values of the molecular descriptors but can also significantly affect the statistical validation of the models, however, no such significant change is observed in the statistical parameters of the models when diffuse functions (indicated by ++) are also included which though are computationally more expensive. All the quantum-mechanical calculations are performed with Gaussian 0338 suite of quantum-chemistry software package, except for the computations using M06, M06-L and M06-2X functionals, for which a SCF-MO package, ORCA39 was employed.
It should further be noted that for a few of the compounds, for example for 6-nitrobenzo[a]pyrene, the value of chemical hardness, computed through the CORR (HFX + LYP) and CORR (BLYP) methods, is exactly zero as evident in the ESI Table S2e.† Since the chemical hardness (η) and electrophilicity index (ω) index are related with each other through the absolute electronegativity (χ) as,
![]() | (4) |
Further, the stability and reliability of the developed QSAR models is assessed through the statistical validation parameters listed in Table 2 for the internal and external validation. Parameters, namely, the coefficient of determination (R2), R2 using leave-one-out method (QLOO2), and cross-validated concordance correlation coefficient (CCCCV) are employed to determine the model's internal robustness. Besides these, more rigorous procedure such as cross-validated leave-many-out (QLMO2), Y-scrambling (QYscr2) and randomization (QYrand2) are employed for the internal validation, with 1000 iterations in each of the procedure, and leaving 30% of the chemicals from the training set at a time in the QLMO2 procedure. The robustness of a model can be guaranteed if the value of leave-many-out parameters like QLMO2 is similar to that of R2 and QLOO2, whereas the lower values of Y-scrambling parameters ensure that there is a minimum chance-correlation in the proposed models. Furthermore, a low difference between the R2 and QLOO2 indicates similar performance of the model in the fitting and internal predictivity.
S. no. | Parameter | Significance |
---|---|---|
1 | ![]() |
Provides the information regarding goodness-of-fit of the model (ref. 48) |
2 | ![]() |
Useful in determining whether the over-fitting occurs in a model (ref. 48) |
3 | ![]() |
Judges the model's performance for new chemicals, but the TSS is calculated using the mean of the training set (ȳTR) (ref. 44 and 45) |
4 | ![]() |
Judges the model's performance for new chemicals, but the TSS is calculated using mean of the external set (ȳEXT) (ref. 46) |
5 | ![]() |
Judges the model's performance for new chemicals, and it is independent of the size and distribution of the data set (ref. 47) |
6 | ![]() ![]() |
Determines the agreement between the experimental and the predicted activity from a model (ref. 48 and 49) |
7 | ![]() ![]() |
Provides the information regarding overestimation in the prediction due to wide response range (ref. 50) |
8 | ![]() |
Represents the root mean square error in the cross-validation (CV) by the leave-one-out (LOO) method and in the external validation (EXT) (ref. 52) |
9 | ![]() |
Represents the mean absolute error in the cross-validation (CV) and in the external validation (EXT) (ref. 52) |
On the other hand, to ensure the real external predictivity of the models, state-of-the-art external validation parameters listed in Table 2 are employed, which include the predictive squared correlation coefficient such as QF12,44,45 QF22,46 QF32,47 CCCEXT,48,49 and rm2 metrics50 based parameters such as average , and differential Δrm2 with the threshold values similar to those employed by Chirico and Gramatica51 and also in our previous studies.3–6 Among these external validation parameters, the CCCEXT is the most stringent parameter which in fact determines the degree of agreement between the observed and predicted activity.48,49 The statistical significance of the models is further analyzed through the mean absolute error (MAE),52 and root mean square error (RMSE)52 in the internal validation (in terms of MAECV, RMSECV) as well as in the external validation (in terms of MAEEXT, RMSEEXT) as depicted in Fig. 1 and ESI Tables S11–S30.† Further to check the descriptor collinearity, QUIK rule (Q Under Influence of K) with a threshold ΔK value of 0.5 is also employed.53 Finally, the degree of scattering between the experimentally (observed) and predicted mutagenicity from the developed models is analyzed through the scatter plots. The splitting of the data set, model development and validation were performed through the QSARINS54,55 software. All the parameters employed to check the internal and external validation of various models are collected in the ESI Tables S11–S30,† whereas only the key parameters, provided in the Tables 3–7, are taken for the discussions in the next section.
![]() | ||
Fig. 1 Comparison of the mean absolute error (MAE) and the root mean square error (RMSE) in the external (EXT) predictivity of various models, for the TA100 mutagenicity of nitrated-PAHs, based on the total energy (E), energy of the HOMO and the LUMO, absolute electronegativity (χ), chemical hardness (η) and electrophilicity index (ω) computed through exchange-only methods (HFX, B88), exchange–correlation (XC) methods (HFX + LYP, BLYP, B3LYP, M06, M06-L, M06-2X), and also based on the descriptors incorporating mainly the effect of electron correlation (CORR) from the respective XC methods (for the TA98 mutagenicity models, see ESI Fig. S1†). |
Model s. no. | Method | Descriptor employed | Splitting employed | R2 | QLOO2 | R2 − QLOO2 | QLMO2 | QF32 | CCCEXT |
---|---|---|---|---|---|---|---|---|---|
Exchange (X) only | |||||||||
1 | DFT/B88 | EB88, EHOMOB88 | 30% | 0.652 | 0.559 | 0.093 | 0.525 | 0.673 | 0.808 |
2 | DFT/HFX | EHFX, EHOMOHFX | 30% | 0.792 | 0.732 | 0.061 | 0.719 | 0.786 | 0.886 |
![]() |
|||||||||
Exchange + correlation (X + C) | |||||||||
3 | HFX + LYP | EHFX+LYP, EHOMOHFX+LYP | 30% | 0.765 | 0.718 | 0.047 | 0.698 | 0.864 | 0.920 |
4 | BLYP | EBLYP, EHOMOBLYP | 30% | 0.738 | 0.685 | 0.053 | 0.664 | 0.638 | 0.769 |
5 | B3LYP | EB3LYP, EHOMOB3LYP | 30% | 0.675 | 0.616 | 0.059 | 0.596 | 0.651 | 0.772 |
6 | M06 | EM06, EHOMOM06 | 30% | 0.660 | 0.588 | 0.072 | 0.557 | 0.697 | 0.790 |
7 | M06-L | EM06-L, EHOMOM06-L | 30% | 0.634 | 0.551 | 0.083 | 0.561 | 0.685 | 0.769 |
8 | M06-2X | EM06-2X, EHOMOM06-2X | 30% | 0.656 | 0.583 | 0.073 | 0.520 | 0.707 | 0.795 |
![]() |
|||||||||
Electron-correlation (CORR) only | |||||||||
9 | CORR (HFX + LYP) | ECORR(HFX+LYP), EHOMOCORR(HFX+LYP) | 30% | 0.730 | 0.675 | 0.055 | 0.659 | 0.834 | 0.843 |
10 | CORR (BLYP) | ECORR(BLYP), EHOMOCORR(BLYP) | 30% | 0.724 | 0.675 | 0.055 | 0.602 | 0.784 | 0.783 |
11 | CORR (B3LYP) | ECORR(B3LYP), EHOMOCORR(B3LYP) | 30% | 0.779 | 0.727 | 0.052 | 0.714 | 0.862 | 0.874 |
12 | CORR (M06) | ECORR(M06), EHOMOCORR(M06) | 30% | 0.778 | 0.723 | 0.054 | 0.695 | 0.806 | 0.889 |
13 | CORR (M06-L) | ECORR(M06-L), EHOMOCORR(M06-L) | 30% | 0.774 | 0.722 | 0.052 | 0.694 | 0.757 | 0.857 |
14 | CORR (M06-2X) | ECORR(M06-2X), EHOMOCORR(M06-2X) | 30% | 0.759 | 0.702 | 0.058 | 0.679 | 0.862 | 0.926 |
It should further be noted that the models developed with the X-only, X + C and CORR-only methods differs in the distribution of the compounds in the training and external prediction set. The splitting is performed separately while developing models based on the descriptors computed using different electronic structure methods such that the best possible models are obtained, however, the outliers (the excluded compounds) differ in the splitting used for the various methods as indicated in the ESI Tables S3 and S4.† Though, we had also employed the same splitting for all the electronic structure methods considered in this work, however, no significant variation in the statistical parameters is observed, in fact, the models were still observed to be as robust and predictive as those using different splitting.
Model s. no. | Method | Descriptor employed | Splitting employed | R2 | QLOO2 | R2 − QLOO2 | QLMO2 | QF32 | CCCEXT |
---|---|---|---|---|---|---|---|---|---|
Exchange (X) only | |||||||||
1 | DFT/B88 | EB88, ELUMOB88 | 30% | 0.728 | 0.640 | 0.085 | 0.614 | 0.722 | 0.849 |
2 | DFT/HFX | EHFX, ELUMOHFX | 30% | 0.627 | 0.545 | 0.082 | 0.508 | 0.646 | 0.789 |
![]() |
|||||||||
Exchange + correlation (X + C) | |||||||||
3 | HFX + LYP | EHFX+LYP, ELUMOHFX+LYP | 30% | 0.578 | 0.502 | 0.075 | 0.472 | 0.721 | 0.790 |
4 | BLYP | EBLYP, ELUMOBLYP | 30% | 0.668 | 0.606 | 0.062 | 0.576 | 0.817 | 0.880 |
5 | B3LYP | EB3LYP, ELUMOB3LYP | 30% | 0.647 | 0.584 | 0.064 | 0.573 | 0.828 | 0.886 |
6 | M06 | EM06, ELUMOM06 | 30% | 0.695 | 0.636 | 0.060 | 0.617 | 0.775 | 0.836 |
7 | M06-L | EM06-L, ELUMOM06-L | 30% | 0.705 | 0.646 | 0.059 | 0.632 | 0.797 | 0.854 |
8 | M06-2X | EM06-2X, ELUMOM06-2X | 30% | 0.683 | 0.620 | 0.063 | 0.588 | 0.744 | 0.809 |
![]() |
|||||||||
Electron-correlation (CORR) only | |||||||||
9 | CORR (HFX + LYP) | ECORR(HFX+LYP), ELUMOCORR(HFX+LYP) | 30% | 0.717 | 0.655 | 0.062 | 0.642 | 0.779 | 0.787 |
10 | CORR (BLYP) | ECORR(BLYP), ELUMOCORR(BLYP) | 30% | 0.710 | 0.647 | 0.063 | 0.637 | 0.727 | 0.757 |
11 | CORR (B3LYP) | ECORR(B3LYP), ELUMOCORR(B3LYP) | 30% | 0.697 | 0.603 | 0.094 | 0.607 | 0.683 | 0.696 |
12 | CORR (M06) | ECORR(M06), ELUMOCORR(M06) | 30% | 0.643 | 0.546 | 0.097 | 0.535 | 0.636 | 0.744 |
13 | CORR (M06-L) | ECORR(M06-L), ELUMOCORR(M06-L) | 30% | 0.656 | 0.562 | 0.094 | 0.529 | 0.616 | 0.725 |
14 | CORR (M06-2X) | ECORR(M06-2X), ELUMOCORR(M06-2X) | 30% | 0.659 | 0.576 | 0.083 | 0.549 | 0.712 | 0.810 |
Furthermore, among the models based on the descriptors computed using X-only methods, the DFT/HFX computed HOMO energy model (entry 2 in Table 3) based on EHFX and EHOMOHFX, outperforms rest of the models while modeling both the activities. Moreover, this model have a more generalized applicability domain and least scattering between the experimental and predicted activity as evident from the Williams plot and scatter plot represented in Fig. 2 for the TA100 mutagenicity. Apart from the HOMO energy based model, the model (entry 2 in Table 5 and ESI Table S7†) based on the DFT/HFX computed total energy and absolute electronegativity, is also found to be robust for both types of mutagenicity. Besides this, the electrophilicity index based model (entry 2 in Table 7) is also found to be reliable. The Williams and scatter plots of these best performing models developed with the X-only methods are provided in the ESI Fig. S2(A and B) and S3(A and B).† Further, for the TA100 activity, the model (entry 2 in Table 6) developed with the total energy and chemical hardness performs satisfactorily only when computed with the DFT/HFX method, whereas the external predictivity of the same model (entry 1 in Table 6) but developed with the descriptor computed using the DFT/B88 method is observed to be comparatively less reliable. However, among the TA98 models, both the X-only methods give robust parameters for the models based on the chemical hardness (see entries 1 and 2 in the ESI Table S8†), but it may be due to the small data-set used for the TA98 models as explained in the previous section.
![]() | ||
Fig. 2 (a) Williams plot: standardized residuals versus leverage (h), for the TA100 mutagenicity model based on the total energy and the energy of the HOMO computed using DFT/HFX method. The training and prediction set chemicals, represented with open (yellow) and filled (blue) circles, respectively, are obtained using 30% random splitting method. The encircled values represent the ID number of the compounds, provided in the ESI Table S1 (for other best models, see also ESI Fig. S2†). The vertical (solid) line indicates the warning leverage (h*). (b) Scatter plot: experimental versus predicted log![]() |
Model s. no. | Method | Descriptor employed | Splitting employed | R2 | QLOO2 | R2 − QLOO2 | QLMO2 | QF32 | CCCEXT |
---|---|---|---|---|---|---|---|---|---|
Exchange (X) only | |||||||||
1 | DFT/B88 | EB88, χB88 | 30% | 0.716 | 0.640 | 0.076 | 0.621 | 0.686 | 0.831 |
2 | DFT/HFX | EHFX, χHFX | 30% | 0.796 | 0.735 | 0.062 | 0.718 | 0.762 | 0.877 |
![]() |
|||||||||
Exchange + correlation (X + C) | |||||||||
3 | HFX + LYP | EHFX+LYP, χHFX+LYP | 30% | 0.734 | 0.681 | 0.052 | 0.662 | 0.845 | 0.907 |
4 | BLYP | EBLYP, χBLYP | 30% | 0.717 | 0.662 | 0.055 | 0.640 | 0.736 | 0.823 |
5 | B3LYP | EB3LYP, χB3LYP | 30% | 0.692 | 0.636 | 0.056 | 0.605 | 0.726 | 0.827 |
6 | M06 | EM06, χM06 | 30% | 0.704 | 0.645 | 0.059 | 0.624 | 0.756 | 0.836 |
7 | M06-L | EM06-L, χM06-L | 30% | 0.679 | 0.616 | 0.064 | 0.596 | 0.752 | 0.825 |
8 | M06-2X | EM06-2x, χM06-2X | 30% | 0.698 | 0.639 | 0.059 | 0.608 | 0.768 | 0.842 |
![]() |
|||||||||
Electron-correlation (CORR) only | |||||||||
9 | CORR (HFX + LYP) | ECORR(HFX+LYP), χCORR(HFX+LYP) | 30% | 0.723 | 0.660 | 0.063 | 0.661 | 0.800 | 0.818 |
10 | CORR (BLYP) | ECORR(BLYP), χCORR(BLYP) | 30% | 0.726 | 0.676 | 0.051 | 0.654 | 0.787 | 0.789 |
11 | CORR (B3LYP) | ECORR(B3LYP), χCORR(B3LYP) | 30% | 0.739 | 0.689 | 0.050 | 0.671 | 0.850 | 0.862 |
12 | CORR (M06) | ECORR(M06), χCORR(M06) | 30% | 0.724 | 0.662 | 0.062 | 0.622 | 0.787 | 0.883 |
13 | CORR (M06-L) | ECORR(M06-L), χCORR(M06-L) | 30% | 0.735 | 0.675 | 0.060 | 0.642 | 0.784 | 0.880 |
14 | CORR (M06-2X) | ECORR(M06-2X), χCORR(M06-2X) | 30% | 0.723 | 0.656 | 0.067 | 0.622 | 0.827 | 0.905 |
Model s. no. | Method | Descriptor employed | Splitting employed | R2 | QLOO2 | R2 − QLOO2 | QLMO2 | QF32 | CCCEXT |
---|---|---|---|---|---|---|---|---|---|
Exchange (X) only | |||||||||
1 | DFT/B88 | EB88, ηB88 | 30% | 0.538 | 0.391 | 0.148 | 0.377 | 0.630 | 0.746 |
2 | DFT/HFX | EHFX, ηHFX | 30% | 0.745 | 0.667 | 0.078 | 0.639 | 0.767 | 0.870 |
![]() |
|||||||||
Exchange + correlation (X + C) | |||||||||
3 | HFX + LYP | EHFX+LYP, ηHFX+LYP | 30% | 0.771 | 0.719 | 0.051 | 0.697 | 0.869 | 0.920 |
4 | BLYP | EBLYP, ηBLYP | 30% | 0.774 | 0.721 | 0.053 | 0.709 | 0.136 | 0.503 |
5 | B3LYP | EB3LYP, ηB3LYP | 30% | 0.614 | 0.521 | 0.093 | 0.510 | 0.579 | 0.693 |
6 | M06 | EM06, ηM06 | 30% | 0.583 | 0.453 | 0.130 | 0.451 | 0.618 | 0.703 |
7 | M06-L | EM06-L, ηM06-L | 30% | 0.569 | 0.436 | 0.133 | 0.430 | 0.573 | 0.663 |
8 | M06-2X | EM06-2x, ηM06-2X | 30% | 0.588 | 0.462 | 0.126 | 0.459 | 0.631 | 0.714 |
![]() |
|||||||||
Electron-correlation (CORR) only | |||||||||
9 | CORR (HFX + LYP) | ECORR(HFX+LYP), ηCORR(HFX+LYP) | 30% | 0.715 | 0.658 | 0.056 | 0.625 | 0.794 | 0.796 |
10 | CORR (BLYP) | ECORR(BLYP), ηCORR(BLYP) | 30% | 0.723 | 0.676 | 0.048 | 0.596 | 0.771 | 0.771 |
11 | CORR (B3LYP) | ECORR(B3LYP), ηCORR(B3LYP) | 30% | 0.781 | 0.729 | 0.052 | 0.720 | 0.774 | 0.792 |
12 | CORR (M06) | ECORR(M06), ηCORR(M06) | 30% | 0.725 | 0.663 | 0.062 | 0.626 | 0.785 | 0.882 |
13 | CORR (M06-L) | ECORR(M06-L), ηCORR(M06-L) | 30% | 0.736 | 0.675 | 0.060 | 0.648 | 0.789 | 0.883 |
14 | CORR (M06-2X) | ECORR(M06-2X), ηCORR(M06-2X) | 30% | 0.723 | 0.656 | 0.067 | 0.626 | 0.826 | 0.905 |
Model s. no. | Method | Descriptor employed | Splitting employed | R2 | QLOO2 | R2 − QLOO2 | QLMO2 | QF32 | CCCEXT |
---|---|---|---|---|---|---|---|---|---|
Exchange (X) only | |||||||||
1 | DFT/B88 | EB88, ωB88 | 30% | 0.643 | 0.472 | 0.171 | 0.477 | 0.690 | 0.804 |
2 | DFT/HFX | EHFX, ωHFX | 30% | 0.778 | 0.720 | 0.059 | 0.696 | 0.723 | 0.854 |
![]() |
|||||||||
Exchange + correlation (X + C) | |||||||||
3 | HFX + LYP | EHFX+LYP, ωHFX+LYP | 30% | 0.688 | 0.631 | 0.057 | 0.606 | 0.827 | 0.891 |
4 | BLYP | EBLYP, ωBLYP | 30% | 0.595 | 0.526 | 0.069 | 0.499 | 0.812 | 0.880 |
5 | B3LYP | EB3LYP, ωB3LYP | 30% | 0.611 | 0.541 | 0.070 | 0.513 | 0.806 | 0.863 |
6 | M06 | EM06, ωM06 | 30% | 0.683 | 0.620 | 0.063 | 0.584 | 0.690 | 0.771 |
7 | M06-L | EM06-L, ωM06-L | 30% | 0.658 | 0.589 | 0.069 | 0.546 | 0.602 | 0.707 |
8 | M06-2X | EM06-2X, ωM06-2X | 30% | 0.702 | 0.644 | 0.057 | 0.619 | 0.762 | 0.795 |
![]() |
|||||||||
Electron-correlation (CORR) only | |||||||||
9 | CORR (HFX + LYP) | E(CORR,HFX+LYP), ω(CORR,HFX+LYP) | 30% | 0.727 | 0.663 | 0.064 | 0.630 | 0.595 | 0.791 |
10 | CORR (BLYP) | ECORR(BLYP), ωCORR(BLYP) | 30% | 0.768 | 0.695 | 0.074 | 0.639 | 0.632 | 0.716 |
11 | CORR (B3LYP) | ECORR(B3LYP), ωCORR(B3LYP) | 30% | 0.772 | 0.730 | 0.042 | 0.706 | 0.898 | 0.912 |
12 | CORR (M06) | ECORR(M06), ωCORR(M06) | 30% | 0.726 | 0.661 | 0.065 | 0.629 | 0.783 | 0.882 |
13 | CORR (M06-L) | ECORR(M06-L), ωCORR(M06-L) | 30% | 0.726 | 0.661 | 0.065 | 0.641 | 0.790 | 0.882 |
14 | CORR (M06-2X) | ECORR(M06-2X), ωCORR(M06-2X) | 30% | 0.709 | 0.640 | 0.070 | 0.606 | 0.815 | 0.896 |
It should be noted that the HF and DFT/HFX methods employing 100% HF exchange without any correlation functional are expected to yield exactly the same models since the expression for the energy in the two methods is exactly the same. However, in the present work, for a few molecules, the energy and other molecular descriptors computed using the two methods differs slightly as evident in the ESI Table S2a.† This difference is mainly due to the different algorithms used for the HF and DFT codes in the computational software. Furthermore, as evident in Tables 3–7 (entry 2) for the DFT/HFX method and in the ESI Table S10† for the HF method, though the model's parameters based on the same descriptors computed using the two methods differ, however, the overall reliability and predictivity of the models do not vary significantly.
The statistical performance of the various QSAR models developed with the descriptors computed using the X-only methods suggests that the quantum-mechanical exchange interactions between the electrons can be highly significant in the modeling of the mutagenic activities as evident in the present study on the nitrated-PAHs. It should, however, be noted that the descriptors computed through the X-only methods also includes the effect of kinetic motion of electrons, electron-nuclear Coulombic attraction etc. For example, the total energy computed using the X-only methods is the sum of the kinetic energy of electrons, potential energy due to electron–nuclear attraction, and the exchange energy due to quantum-mechanical interactions between the electrons of parallel spin, but neglecting some instantaneous electron–electron interactions which can also be highly significant as described below.
Comparing the statistical validation parameters in entries 3–8 of Tables 3–7 and the errors in the prediction (depicted in Fig. 1 and ESI Fig. S1†) for various QSAR models developed through the molecular descriptors computed with different X + C methods, it is clearly evident that for the TA100 mutagenicity, the HFX + LYP based models are the most robust, except in the case of model (entry 3 in Table 4) based on the LUMO energy. However, the model (entry 3 in Table 6) developed with the total energy and chemical hardness computed with HFX + LYP method outperforms all other models for the TA100 mutagenicity. Besides this, the statistical stability and reliability of the model (entry 3 in Table 3) based on the total energy and HOMO energy is also found to be comparable to that of the model based on the total energy and chemical hardness. In fact, for the TA98 mutagenicity, the HOMO energy based model (entry 3 in the ESI Table S5†) is also found to be highly reliable and predictive. The Williams plot and scatter plot of these reliable models observed are further provided in the ESI Fig. S2(A and B) and S3(A and B).†
However, the reliability of the models developed with the descriptors computed using new-generation meta XC functionals (M06, M06-L, M06-2X) is found to be less than that of the models developed using the widely used B3LYP functional as evident from the statistical parameters listed in Tables 3–7 (entries 5–8). In fact, for the TA100 mutagenicity, these functionals show low validation parameters in most of the models, though in case of the models based on the absolute electronegativity and electrophilicity index (entries 6–8 in Tables 5 and 7), reliable internal validation parameters are observed, however, these models are still less predictive as indicated by the external validation parameter (CCCEXT). Similar observations are also made in the case of TA98 mutagenicity. Further, among all the models developed with the LUMO energy computed using the X + C methods, only the BLYP and M06-L functionals show statistically reliable parameters. However, the TA100 mutagenicity model (entry 4 in Table 6) based on the chemical hardness computed using the BLYP functional is found to be unreliable.
Overall, the effect of the electron-correlation along with the exchange, including that from the hybrid XC functionals, seems to increase the external prediction ability of the models but at the same time decreasing the internal stability in a few models. Therefore, it would be interesting to see if the exclusion of some exchange while retaining mainly the effect of the electron-correlation can increase the internal stability of the models, as analyzed below.
![]() | ||
Fig. 3 Same as Fig. 2, but for the TA100 mutagenicity model based on ECORR(B3LYP) and ωCORR(B3LYP) descriptors, incorporating mainly the effect of electron-correlation (CORR) in the total-energy (E) and the electrophilicity index (ω), computed using the DFT employing B3LYP hybrid XC functional (see also ESI Fig. S2 and S3†). |
Besides this, the electron-correlation in the new-generation XC functionals such as M06, M06-L and M06-2X, is also observed to yield highly reliable models, particularly the models (entries 12–14 in Tables 3, 5 and 6) based on the HOMO energy, absolute electronegativity and chemical hardness, are found to be quite robust. Similar trend is observed in the case of TA98 mutagenicity, where the models developed with the CORR descriptors computed using these functionals show excellent internal as well as external reliability as also evident from the Williams plot and scatter plots depicted in the ESI Fig. S2(A and B) and S3(A and B).† Further, the models based on the electron-correlation contribution from the pure XC functionals, HFX + LYP and BLYP, are also found to be reliable though less predictive than those developed using the hybrid XC functionals. However, for the model based on the LUMO energy (entry 9 in Table 4), the CORR (HFX + LYP) is found to be the most reliable method. In fact, the models developed using the descriptor incorporating the electron-correlation through HFX + LYP are observed to be statistically more robust than those developed using the BLYP, clearly indicating the importance of the exact HF exchange.
Furthermore, as evident from the ESI Tables S5–S9,† in the case of TA98 mutagenicity, some of the models shows negative value for the internal and external validation parameters which can be mainly attributed to the very small data set employed in this work compared to a more reliable data-set used in our previous study,3 where such models based on the descriptors computed using the B3LYP are also found to highly reliable.
Further, as evident from the entries 5 and 11 in Tables 3–7 for the models based on the widely used XC functional B3LYP, it is observed that the models are more robust and predictive when mainly the effect of electron-correlation is included in the descriptor, suggesting the electron-correlation interactions to be significant while developing the externally predictive quantitative models. However, this is not always the case as evident from the models (entries 3 and 9 in Table 3) based on the HOMO energy computed through the HFX + LYP, where mainly retaining the effect of LYP correlation did not seem to improve the quality of the model (entry 9 in Table 3). Moreover, as discussed previously, the HOMO energy based model (entry 2 in Table 3) has more reliable internal validation when only the exact HF exchange (HFX) is included without any correlation, whereas the inclusion of LYP correlation to the HFX is observed to decrease the internal stability of the HOMO energy based model as evident in entry 3 of Table 3.
Further, for the models based on the HOMO energy, all the X-only as well as CORR methods are found to provide robust internal and external validation parameters, suggesting this descriptor to be an elite choice for modeling the mutagenic potential of compounds as had also been observed in our recent studies3,4 which though employ different composition of the training and prediction sets. However, among the models based on the LUMO energy, the methods without the HF exchange, that is, B88 and M06-L are observed to be the most reliable. From the robustness and reliability of the models developed in our present and previous studies,3,4 it is evident that the models based on the descriptors incorporating mainly the effect of electron-correlation, particularly from the total energy, energy of the HOMO, and electrophilicity index, can be highly reliable for developing externally predictive QSAR models for the TA100 and TA98 mutagenicity, irrespective of the data set distribution. Furthermore, from the Williams plots and scatter plots, depicted in the ESI Fig. S2(A and B) and S3(A and B),† it is evident that the models developed with the X-only and with the CORR descriptors have a generalized domain of applicability, and the least scattering between the predicted and experimental activity, suggesting these methods to be highly reliable. The reliability of the electron-correlation based descriptors is also evident from the quality of the consensus models listed in Table 8, which were proposed using the best models observed in the present study. It should be noted that a consensus model incorporates various molecular aspects of the compounds through different descriptors.
Consensus model s. no. | Descriptor employed | Splitting employed | RWCM2 | RMSETR | QF32 | CCCEXT | RMSEEXT |
---|---|---|---|---|---|---|---|
a RWCM2 represent the coefficient of determination obtained with weight consensus model strategy. | |||||||
TA100 mutagenicity | |||||||
1 | EHFX, EHOMOHFX, χHFX, ωHFX | 30% | 0.805 | 0.769 | 0.774 | 0.880 | 0.818 |
2 | EHFX+LYP, EHOMOHFX+LYP, χHFX+LYP, ηHFX+LYP | 30% | 0.793 | 0.836 | 0.862 | 0.917 | 0.644 |
3 | EM06-2X, χM06-2X, ωM06-2X | 30% | 0.747 | 0.912 | 0.810 | 0.861 | 0.750 |
4 | ECORR(B3LYP), EHOMOCORR(B3LYP), χCORR(B3LYP), ωCORR(B3LYP) | 30% | 0.792 | 0.832 | 0.886 | 0.898 | 0.605 |
5 | ECORR(M06-2X), EHOMOCORR(M06-2X), ωCORR(M06-2X) | 30% | 0.806 | 0.878 | 0.844 | 0.914 | 0.681 |
![]() |
|||||||
TA98 mutagenicity | |||||||
6 | EHFX, EHOMOHFX, χHFX, ωHFX | 30% | 0.959 | 0.474 | 0.929 | 0.965 | 0.546 |
7 | EHFX+LYP, EHOMOHFX+LYP, χHFX+LYP, ηHFX+LYP | 30% | 0.885 | 0.466 | 0.968 | 0.989 | 0.348 |
8 | ECORR(B3LYP), EHOMOCORR(B3LYP), ωCORR(B3LYP) | Activity sampling | 0.968 | 0.330 | 0.957 | 0.975 | 0.450 |
9 | ECORR(M06-L), EHOMOCORR(M06-L), ωCORR(M06-L) | Activity sampling | 0.936 | 0.529 | 0.959 | 0.980 | 0.403 |
(1) In modeling of the mutagenicity, the descriptors computed using the X-only methods such as DFT/HFX, incorporating an exact HF exchange, are observed to be highly reliable for the models based on the total energy, HOMO energy, absolute electronegativity and electrophilicity index, though the X + C methods of DFT employing the XC functionals such as HFX + LYP and B3LYP performs satisfactorily, while the BLYP and meta XC functionals are also observed to be reliable for the models based on the LUMO energy.
(2) Surprisingly, the external predictivity of the models increases when mainly the effect of the electron-correlation is included in the descriptors particularly when computed through the CORR (B3LYP), CORR (M06), CORR (M06-L), and CORR (M06-2X) methods.
(3) The amount of quantum-mechanical exchange interactions is found to be critical along with the electron-correlation since retaining the latter decreases the internal stability of a few models as observed in the models developed using the CORR (HFX + LYP) and CORR (BLYP) methods.
(4) Notably, the models based on the descriptors incorporating mainly the effect of electron-correlation from the hybrid XC functionals such as B3LYP, and new-generation meta XC functionals like M06, M06-L, M06-2X, are observed to be highly reliable.
From the above conclusions, it may be suggested that the dynamic electron–electron interactions, namely, the quantum-mechanical exchange and correlation, can be highly significant in the reliability and external predictivity of the QSAR models while modeling the biological activities.
Footnote |
† Electronic supplementary information (ESI) available: Tables S1, S2a–f, S3–S30 and Fig. S1, S2(A and B) and S3(A and B). See DOI: 10.1039/c4ra14262d |
This journal is © The Royal Society of Chemistry 2015 |