Role of exchange and correlation in the real external prediction of mutagenicity: performance of hybrid and meta-hybrid exchange–correlation functionals

Reenu; Vikas

doi:10.1039/C4RA14262D

View PDF VersionPrevious ArticleNext Article

DOI: 10.1039/C4RA14262D (Paper) RSC Adv., 2015, 5, 29238-29251

Role of exchange and correlation in the real external prediction of mutagenicity: performance of hybrid and meta-hybrid exchange–correlation functionals†

Reenu and Vikas*
Quantum Chemistry Group, Department of Chemistry & Centre of Advanced Studies in Chemistry, Panjab University, Chandigarh-160014, India. E-mail: qlabspu@pu.ac.in; qlabspu@yahoo.com; Tel: +91-172-2534408

Received 11th November 2014 , Accepted 13th March 2015

First published on 16th March 2015

Abstract

Quantum-mechanical exchange and correlation interactions between electrons are quite crucial in deciding molecular geometry and properties. Such electronic interactions can have a significant role in the reliability of a quantitative structure–activity relationship (QSAR) because the biological activities of the chemicals can be described as a function of the molecular structure through the QSARs which are routinely based on the quantum-mechanical molecular descriptors. In this work, we present a detailed analysis of the effect of the quantum-mechanical exchange and correlation on the internal stability and external predictivity of a QSAR model based on the quantum-mechanical molecular descriptors while modeling the mutagenic activity of a set of 51 nitrated-polycyclic aromatic hydrocarbons (PAHs). For this, various molecular descriptors are computed using electronic structure methods such as the Hartree–Fock (HF) method, and density functional theory (DFT) employing only the exchange functionals (HFX, B88), pure exchange and correlation functionals (HFX + LYP, BLYP), hybrid (B3LYP), meta (M06-L), and meta-hybrid (M06, M06-2X) exchange–correlation (XC) functionals. To further analyze the role of electron-correlation, QSAR models are also developed using the descriptors incorporating mainly the effect of electron-correlation. The external predictivity of the developed models is assessed through state-of-the-art external validation parameters employing an external prediction set of compounds. A comparison of the quality of the models developed with the descriptors computed using different electronic structure methods revealed that the exchange interactions are quite critical along with the electron-correlation in modeling the mutagenicity. Notably, for most of the models, electron-correlation based descriptors are found to be highly reliable when computed using the hybrid XC functionals, particularly B3LYP and M06-2X.

Introduction

An accurate estimation of the intricate instantaneous interactions between electrons, namely, quantum-mechanical exchange and correlation,^1,2 is an important aspect in the computation of the energy of an atomic and a molecular system along with the molecular geometry. On the other hand, an investigation of such interactions can be advantageous in the quantitative modeling of the biological activities, since these fundamental interactions influence the molecular properties such as ionization potential, electron affinity, etc. In our recent studies,^3–6 through an heuristic approach based on the Hartree–Fock (HF)^7,8 method and density functional theory (DFT),^8,9 it has been revealed that the effect of electron-correlations described through the electron-correlation based quantum-mechanical molecular descriptors can be highly significant in the external predictivity of the quantitative models developed for the biological activities and physico-chemical properties of environmentally important compounds. For example, while developing the externally predictive quantitative structure–activity relationships (QSARs) for the mutagenic activity of nitrated-polycyclic aromatic hydrocarbons (PAHs),^3,4 it was found that the electron-correlation energy, and descriptors incorporating mainly the effect of electron-correlations from the molecular descriptors such as the HOMO energy and electrophilicity, are more reliable descriptors than the corresponding whole descriptor like the total energy. Similarly, the descriptors based on the effect of electron-correlations from the total energy and molecular polarizability were also found to be highly reliable while developing the single-parameter based quantitative models for various physico-chemical properties such as aqueous solubility, subcooled liquid vapor pressure, n-octanol/water and n-octanol/air partition coefficient of polychlorinated-dibenzo-p-dioxins (PCDDs) and dibenzo-furans (PCDFs),⁶ and also for the supercooled vapor pressure of polychlorinated-naphthalenes.⁵ In this study, we present a detailed analysis on the role of quantum-mechanical exchange and correlation interactions in the real external predictivity of the QSARs. It should be noted that the “real” external predictivity of the quantitative models should be determined using an external prediction set of compounds not used in the model development.^10,11

The HF method incorporates exchange interactions exactly, though neglecting a significant part of the dynamic electron correlations,^7,8 while the DFT accounts for the exchange–correlation (XC) through an XC density functional, the exact form of which is yet unknown. The most widely used DFT XC functional, B3LYP, was introduced by Becke¹² and Lee–Yang–Parr¹³ which is a hybrid of a local and non-local exchange and correlation. Enormous efforts have been made recently and in the past decade to find out proper XC density functionals which should yield accurate energies, geometries and thermo-chemical properties for not only the covalent systems but also for the non-covalently interacting systems.^14–16 Literature analysis shows that the many widely applicable functionals, using the generalized-gradient approximation (GGA) based on the gradient of the electron density, are separately accurate for molecules,^17,18 solids,¹⁹ interfaces,²⁰ and even low-dimensional systems,²¹ but no GGA functional is simultaneously accurate for all of these systems.²² On the other hand, recently available meta-GGA functionals, which include the dependence on the spin kinetic energy density, overcome many of the limitations of GGA functionals with almost same computational cost.^23,24 However, the reliability of hybrid GGA functionals in describing the energies can be sensitive to the amount of exact HF exchange embraced in the functional because the quantum-mechanical exchange interactions are necessary for an accurate description of an atomic or a molecular structure.²⁵ The performance of various density functionals, with and without exchange incorporation, has previously been analyzed in detail by the different studies.^25–27

The present work compares the performance of exchange and correlation contributions of a few relevant and widely used pure, hybrid, meta, and meta-hybrid GGA functionals as well as an exact HF exchange towards the real external predictivity of the QSARs based on quantum-mechanical molecular descriptors, namely, the total energy of a molecule (E), energy of the highest occupied and lowest unoccupied molecular orbital (E_HOMO and E_LUMO), absolute electronegativity (χ), chemical hardness (η), and electrophilicity index (ω).^28–30 For the present study, QSAR models are developed and analyzed for a couple of biological activities, namely, the base-pair and frame-shift mutation activity of 51 nitrated-PAHs. The XC functionals of the DFT analyzed in the present study are widely used for the QSAR modeling, as discussed in the next section.

Theoretical and computational details

As listed in Table 1, the different electronic structure methods employed in the present study for the computations of molecular descriptors, include: (1) X-only methods incorporating only the exchange, namely, DFT with pure HF exchange (HFX), and Becke exchange (B88)^8,31 but without using any correlation functional, (2) X + C methods incorporating both exchange and correlation, namely, DFT with pure XC functionals such as HFX + LYP and BLYP,¹³ hybrid XC functional such as B3LYP,^32,33 meta XC functional like M06-L,³⁴ and meta-hybrid XC functionals like M06 [thin space (1/6-em)]

³⁵ and M06-2X,³⁵ and (3) the CORR methods where mainly the effect of the electron-correlation on the molecular descriptor is considered. As employed in our previous studies,^3–6,36 the effect of electron-correlation (CORR) on a molecular descriptor (D) is estimated using an heuristic approach through,


D_CORR = D_DFT − D_HF/X-only	(1)

where D_DFT is the descriptor computed at the DFT level using an XC functional, and D_X-only is that computed using the X-only DFT such as involving HFX or B88 exchange but without any correlation, while D_HF is computed at the HF level. It should, however, be noted that the effect of electron-correlation determined through eqn (1) while employing a hybrid XC functional, still includes some exchange interactions. The difference in the same descriptors computed using different electronic structure methods is due to the effect of exchange and correlation on the kinetic energy of electrons, electron-nuclear potential energy, and inter-electronic repulsion energy between the electrons of same (parallel) as well as different (antiparallel) spin. It should also be noted that to capture the effect of electron-correlation using XC functionals such as B3LYP, M06, M06-L and M06-2X, the D_HF is employed, however, while using the HFX + LYP and BLYP functionals, D_X-only is used as,


D_{CORR(HFX+LYP)} = D_HFX+LYP − D_HFX,	(2)


D_CORR(BLYP) = D_BLYP − D_B88.	(3)

Table 1 Comparison of the exchange (X)-only, exchange–correlation (X + C) and correlation (CORR)-only, quantum-mechanical methods in terms of the quantum-mechanical exchange and correlation interactions

S. no.	Method	Exchange			Correlation
S. no.	Method	Electron-density gradients	LSD^a/Slater	HF (%)	Correlation
a Local spin density.b Perdew–Burke–Ernzerhof.c Voorhis and Scuseria.
X-only methods
1	HFX	—	—	100%	No
2	B88	B88	Yes	0%	No

X + C methods
3	HFX + LYP	—	—	100%	From LYP functional (ref. 12 and 13)
4	BLYP	B88	Yes	0%	From LYP functional (ref. 12 and 13)
5	B3LYP	B88	Yes	20%	From B3LYP functional (ref. 12 and 13)
6	M06-L	PBE^b and spin kinetic energy density	Yes	0%	M05 correlation functional augmented by VS^c terms, treating opposite- and parallel-spin correlations differently (ref. 34 and references therein)
7	M06	As in M06-L		27%	As in M06-L (ref. 35)
8	M06-2X	As in M06-L		54%	As in M06-L (ref. 35)

CORR only methods
9	CORR (HFX + LYP)	As compensated in HFX + LYP through eqn (2)			From LYP functional
10	CORR (BLYP)	As compensated in BLYP through eqn (3)			From LYP functional
11	CORR (B3LYP)	As compensated in B3LYP through eqn (1)			From B3LYP functional
12	CORR (M06)	As compensated in M06 through eqn (1)			From M06 functional
13	CORR (M06-L)	As compensated in M06-L through eqn (1)			From M06-L functional
14	CORR (M06-2X)	As compensated in M06-2X through eqn (1)			From M06-2X functional

It should further be noted that the descriptors, D_CORR, computed using different X-only and X + C methods differs not merely due to the exchange and/or correlation contribution but also due to the differing effects of the XC functionals which in fact result in different electron densities for the same molecular structure. The mathematical difference in the eqn (1)–(3) for D_CORR, computed using an XC functional, for example BLYP, actually represents the effect of the descriptor arising from the difference between the electron-densities obtained using the BLYP X + C functional and B88 X-only functional.

Furthermore, the estimation of electron-correlation in the molecular descriptors, particularly for the biologically relevant compounds, at the advanced ab-initio theories such as configuration interaction method³⁷ and coupled cluster theory⁷ demands huge computational resources and time. The aforementioned strategy employed in this study and in our previous works^3–6,36 provides a computationally less expensive, though only an approximate, method to compare the role of quantum-mechanical exchange and correlation in the descriptors employed for developing the externally predictive quantitative models. Through these, the effect of variation of the exact HF exchange in the predictivity of QSARs can also be analyzed. For example, M06 and M06-2X incorporates 27% and 54% of HF exchange, respectively, whereas M06-L does not incorporate exact HF exchange as also illustrated in Table 1.

For the computation of various descriptors, the geometry of each of the 51 nitrated-PAHs, listed in the ESI Tables S1 and S2a–f,† is optimized at the HF and DFT level of the theory employing a 6-311G(d,p) Gaussian basis set, which was followed by the harmonic frequency analysis to ensure that the optimized geometry corresponds to a true global minimum. It should be noted that the quantitative models discussed in the present study are based on the molecular descriptors computed using the same basis set i.e., 6-311G(d,p) for various quantum-mechanical computations employing different electronic structure methods, though for a few models, we had also analyzed the role of polarization and diffuse function with the computations performed using 6-311G and 6-311++G(d,p) basis sets. It should further be noted that the exclusion of the polarization functions (d,p) not only leads to different numerical values of the molecular descriptors but can also significantly affect the statistical validation of the models, however, no such significant change is observed in the statistical parameters of the models when diffuse functions (indicated by ++) are also included which though are computationally more expensive. All the quantum-mechanical calculations are performed with Gaussian 03 [thin space (1/6-em)] ³⁸ suite of quantum-chemistry software package, except for the computations using M06, M06-L and M06-2X functionals, for which a SCF-MO package, ORCA³⁹ was employed.

Data base

The data set comprising of 51 nitrated-PAHs having mutagenic potential is taken from the existing literature.⁴⁰ These compounds exhibit base-pair mutagenic potency in the TA100 strain of Salmonella typhimurium. Besides this, 18 compounds of this data set also show mutagenic potency in the TA98 strain of Salmonella typhimurium⁴¹ which corresponds to the frame-shift mutation potency of the compounds. Such genotoxic behavior of the compounds have hazardous impact on all the life forms, some of these are found to be associated with the genetic disorders which can lead to cystic fibrosis, sickle cell anemia, cancer, Crohn's disease, Tay–Sachs disease, etc.^42,43 The data for TA100 and TA98 mutagenic activity of the set of 51 and 18 nitrated-PAHs, respectively, along with the computed quantum-mechanical descriptors described in the previous sections, are provided in the ESI Tables S1 and S2a–f,† respectively. It should, however, be noted that the data-set chosen for the TA100 mutagenicity in this study is sufficiently large and more reliable than that used for the TA98 mutagenicity, therefore, the TA100 models are discussed and presented in detail in the main text of the article whereas the TA98 models are provided in the ESI.† The TA98 models developed on a larger data set is presented in our previous study.³ The QSAR models, for the two types of the mutagenic potency of nitrated-PAHs, are developed using the statistical procedures as described in the next section.

It should further be noted that for a few of the compounds, for example for 6-nitrobenzo[a]pyrene, the value of chemical hardness, computed through the CORR (HFX + LYP) and CORR (BLYP) methods, is exactly zero as evident in the ESI Table S2e.† Since the chemical hardness (η) and electrophilicity index (ω) index are related with each other through the absolute electronegativity (χ) as,


	(4)

hence, the electrophilicity index of these compounds has an undefined value. Therefore, the data set while developing the models using the electrophilicity index as a descriptor computed through the CORR (HFX + LYP) and CORR (BLYP) methods, comprises of 44 and 41 compounds, respectively, which is smaller than the data set of 51 compounds employed in the case of all other methods, as indicated in the ESI Table S3.†

Model development

A reliable QSAR model must be internally robust and externally predictive in order to have a practical viability. In the present work, QSAR models are developed using multi-linear regression (MLR). To test the external predictivity of the models, the whole data-set is divided into two validation sets: one into an internal training set of compounds using which the model is build, and second into an external prediction set of compounds on which the prediction capability of the developed model is tested. This splitting is mainly performed through the activity sampling and random splitting methods. In the activity sampling (or the ordered response) method, splitting is performed taking alternate compounds, according to the decreasing order of the experimental biological activities, into the training and the prediction set, whereas in the random splitting method, the two sets are constituted taking the compounds randomly, for example, 30% in the prediction set (for details on the splitting, see ESI Tables S3 and S4† for the compounds exhibiting mutagenic activity in TA100 strain and TA98 strain, respectively). Further, the applicability domain of each model is examined through the Williams plot⁴⁴ to ensure that the models under consideration do not have any structural outliers (the compounds with the leverage greater than a warning leverage) and/or response outliers (the compounds with the standardized residuals more than three units of the standard deviation). Such outliers can erroneously influence the quality of a model.

Further, the stability and reliability of the developed QSAR models is assessed through the statistical validation parameters listed in Table 2 for the internal and external validation. Parameters, namely, the coefficient of determination (R²), R² using leave-one-out method (Q_LOO²), and cross-validated concordance correlation coefficient (CCC_CV) are employed to determine the model's internal robustness. Besides these, more rigorous procedure such as cross-validated leave-many-out (Q_LMO²), Y-scrambling (Q_Yscr²) and randomization (Q_Yrand²) are employed for the internal validation, with 1000 iterations in each of the procedure, and leaving 30% of the chemicals from the training set at a time in the Q_LMO² procedure. The robustness of a model can be guaranteed if the value of leave-many-out parameters like Q_LMO² is similar to that of R² and Q_LOO², whereas the lower values of Y-scrambling parameters ensure that there is a minimum chance-correlation in the proposed models. Furthermore, a low difference between the R² and Q_LOO² indicates similar performance of the model in the fitting and internal predictivity.

Table 2 Internal and external statistical validation parameters employed for assessing the robustness and external predictivity of the QSAR models

S. no.	Parameter	Significance
1	where y_i is the experimental value of the training set chemicals, ȳ being the mean value, and ŷ is the predicted value from the model. For a complete data set containing n number of objects, the RSS represents the residual sum of squares, and TSS represents the total sum of squares	Provides the information regarding goodness-of-fit of the model (ref. 48)
2	where ŷ_i/i is the predicted value of the activity excluding i^th element from the model. Excluding more than one element gives Q_LMO² parameter. PRESS represents the predictive error sum of squares for n objects in the complete data set	Useful in determining whether the over-fitting occurs in a model (ref. 48)
3	where ȳ_TR is the mean value of the activity for the training set having n_TR number of objects, and n_EXT is the number of objects in the external prediction set	Judges the model's performance for new chemicals, but the TSS is calculated using the mean of the training set (ȳ_TR) (ref. 44 and 45)
4	where ȳ_EXT is the mean value of the activity for the external prediction set having n_EXT number of objects	Judges the model's performance for new chemicals, but the TSS is calculated using mean of the external set (ȳ_EXT) (ref. 46)
5		Judges the model's performance for new chemicals, and it is independent of the size and distribution of the data set (ref. 47)
6	where is the mean of the predicted values	Determines the agreement between the experimental and the predicted activity from a model (ref. 48 and 49)
7	and Δr_m² = r_m² − r^′_m², where and r² and r₀² are the coefficient of determination with and without the intercept of regression line, respectively. r_m² is computed using the experimental values on the ordinate axis whereas r^′_m² is computed using experimental values on the abscissa	Provides the information regarding overestimation in the prediction due to wide response range (ref. 50)
8		Represents the root mean square error in the cross-validation (CV) by the leave-one-out (LOO) method and in the external validation (EXT) (ref. 52)
9		Represents the mean absolute error in the cross-validation (CV) and in the external validation (EXT) (ref. 52)

On the other hand, to ensure the real external predictivity of the models, state-of-the-art external validation parameters listed in Table 2 are employed, which include the predictive squared correlation coefficient such as Q_F1²,^44,45 Q_F2²,⁴⁶ Q_F3²,⁴⁷ CCC_EXT,^48,49 and r_m² metrics⁵⁰ based parameters such as average , and differential Δr_m² with the threshold values similar to those employed by Chirico and Gramatica⁵¹ and also in our previous studies.^3–6 Among these external validation parameters, the CCC_EXT is the most stringent parameter which in fact determines the degree of agreement between the observed and predicted activity.^48,49 The statistical significance of the models is further analyzed through the mean absolute error (MAE),⁵² and root mean square error (RMSE)⁵² in the internal validation (in terms of MAE_CV, RMSE_CV) as well as in the external validation (in terms of MAE_EXT, RMSE_EXT) as depicted in Fig. 1 and ESI Tables S11–S30.† Further to check the descriptor collinearity, QUIK rule (Q Under Influence of K) with a threshold ΔK value of 0.5 is also employed.⁵³ Finally, the degree of scattering between the experimentally (observed) and predicted mutagenicity from the developed models is analyzed through the scatter plots. The splitting of the data set, model development and validation were performed through the QSARINS^54,55 software. All the parameters employed to check the internal and external validation of various models are collected in the ESI Tables S11–S30,† whereas only the key parameters, provided in the Tables 3–7, are taken for the discussions in the next section.


	Fig. 1 Comparison of the mean absolute error (MAE) and the root mean square error (RMSE) in the external (EXT) predictivity of various models, for the TA100 mutagenicity of nitrated-PAHs, based on the total energy (E), energy of the HOMO and the LUMO, absolute electronegativity (χ), chemical hardness (η) and electrophilicity index (ω) computed through exchange-only methods (HFX, B88), exchange–correlation (XC) methods (HFX + LYP, BLYP, B3LYP, M06, M06-L, M06-2X), and also based on the descriptors incorporating mainly the effect of electron correlation (CORR) from the respective XC methods (for the TA98 mutagenicity models, see ESI Fig. S1†).

Table 3 Comparison of the key internal and external validation parameters for the models based on the total energy (E) and energy of the HOMO (E_HOMO) computed with the exchange (X) only, exchange + correlation (X + C) methods, and with the CORR-only method incorporating mainly the effect of electron-correlation in the descriptors, for modeling the TA100 mutagenicity of nitrated-PAHs. For the TA98 mutagenicity, see ESI Table S5

Model s. no.	Method	Descriptor employed	Splitting employed	R²	Q_LOO²	R² − Q_LOO²	Q_LMO²	Q_F3²	CCC_EXT
Exchange (X) only
1	DFT/B88	E_B88, E^HOMO_B88	30%	0.652	0.559	0.093	0.525	0.673	0.808
2	DFT/HFX	E_HFX, E^HOMO_HFX	30%	0.792	0.732	0.061	0.719	0.786	0.886

Exchange + correlation (X + C)
3	HFX + LYP	E_HFX+LYP, E^HOMO_HFX+LYP	30%	0.765	0.718	0.047	0.698	0.864	0.920
4	BLYP	E_BLYP, E^HOMO_BLYP	30%	0.738	0.685	0.053	0.664	0.638	0.769
5	B3LYP	E_B3LYP, E^HOMO_B3LYP	30%	0.675	0.616	0.059	0.596	0.651	0.772
6	M06	E_M06, E^HOMO_M06	30%	0.660	0.588	0.072	0.557	0.697	0.790
7	M06-L	E_M06-L, E^HOMO_M06-L	30%	0.634	0.551	0.083	0.561	0.685	0.769
8	M06-2X	E_M06-2X, E^HOMO_M06-2X	30%	0.656	0.583	0.073	0.520	0.707	0.795

Electron-correlation (CORR) only
9	CORR (HFX + LYP)	E_{CORR(HFX+LYP)}, E^HOMO_{CORR(HFX+LYP)}	30%	0.730	0.675	0.055	0.659	0.834	0.843
10	CORR (BLYP)	E_CORR(BLYP), E^HOMO_CORR(BLYP)	30%	0.724	0.675	0.055	0.602	0.784	0.783
11	CORR (B3LYP)	E_CORR(B3LYP), E^HOMO_CORR(B3LYP)	30%	0.779	0.727	0.052	0.714	0.862	0.874
12	CORR (M06)	E_CORR(M06), E^HOMO_CORR(M06)	30%	0.778	0.723	0.054	0.695	0.806	0.889
13	CORR (M06-L)	E_CORR(M06-L), E^HOMO_CORR(M06-L)	30%	0.774	0.722	0.052	0.694	0.757	0.857
14	CORR (M06-2X)	E_CORR(M06-2X), E^HOMO_CORR(M06-2X)	30%	0.759	0.702	0.058	0.679	0.862	0.926

It should further be noted that the models developed with the X-only, X + C and CORR-only methods differs in the distribution of the compounds in the training and external prediction set. The splitting is performed separately while developing models based on the descriptors computed using different electronic structure methods such that the best possible models are obtained, however, the outliers (the excluded compounds) differ in the splitting used for the various methods as indicated in the ESI Tables S3 and S4.† Though, we had also employed the same splitting for all the electronic structure methods considered in this work, however, no significant variation in the statistical parameters is observed, in fact, the models were still observed to be as robust and predictive as those using different splitting.

Results and discussion

In this work, the QSAR models based only on the two-descriptors are discussed so as to have a clear understanding of the models since increasing the number of descriptors in a model will not only make it difficult to interpret but can also erroneously enhance the statistical validation of the model. Tables 3–7 compares the key statistical validation parameters for various two-descriptor QSAR models for the TA100 mutagenicity, developed using the molecular descriptors computed with the X-only, X + C and CORR quantum-mechanical methods described in the previous sections. The corresponding data of the models for the TA98 mutagenicity is further provided in the ESI Tables S5–S9.† The performance of different quantum-mechanical methods employed for the computation of descriptors is further analyzed, in term of errors (MAE_EXT and RMSE_EXT) in the prediction ability of different models, as presented in Fig. 1 and ESI Fig. S1,† respectively, for the TA100 and TA98 mutagenicity. The detailed performance of various two-descriptor QSAR models developed using different quantum-mechanical methods is presented below.

Exchange (X) only methods

Quantum-mechanical exchange interactions, also known as Fermi correlation, are purely quantum-mechanical in origin, and arise between the electrons of the same spin, prohibiting them to occupy same position in the space even if they belongs to the different orbitals.^1,2 As evident from the entries 1 and 2 in Tables 3–7 and ESI Tables S5–S9,† respectively, for the TA100 and TA98 mutagenicity, the exchange interactions seem to play a significant role, as indicated by the robust statistical parameters for the QSAR models developed in the present work using the descriptors computed through the DFT method employing only the exchange functional such as the exact HF exchange (HFX) and Becke (B88) exchange. From the statistical parameters listed in the tables, it is clearly evident that the models developed with the descriptors computed using the DFT/HFX method are found to be highly robust for TA100 as well as TA98 activity except in the case of TA100 model (entry 2 in Table 4) based on the total energy and the energy of the LUMO. However, the models based on the descriptors computed using the DFT/B88 show reliable parameters only for the TA100 model based on the LUMO energy (entry 1 in Table 4), though in the case of TA98 activity, it shows robust parameters for all the models analyzed in the present work. It should, however, be noted that as mentioned previously, the TA100 models in the present work are more reliable since these are developed using a data-set which is sufficiently large and is widely used in the literature for the comparison of the models for the mutagenicity of PAHs.^40,56

Table 4 Same as Table 3 but for the models based on the total electronic energy (E) and energy of the LUMO (E_LUMO). For the TA98 mutagenicity, see ESI Table S6

Model s. no.	Method	Descriptor employed	Splitting employed	R²	Q_LOO²	R² − Q_LOO²	Q_LMO²	Q_F3²	CCC_EXT
Exchange (X) only
1	DFT/B88	E_B88, E^LUMO_B88	30%	0.728	0.640	0.085	0.614	0.722	0.849
2	DFT/HFX	E_HFX, E^LUMO_HFX	30%	0.627	0.545	0.082	0.508	0.646	0.789

Exchange + correlation (X + C)
3	HFX + LYP	E_HFX+LYP, E^LUMO_HFX+LYP	30%	0.578	0.502	0.075	0.472	0.721	0.790
4	BLYP	E_BLYP, E^LUMO_BLYP	30%	0.668	0.606	0.062	0.576	0.817	0.880
5	B3LYP	E_B3LYP, E^LUMO_B3LYP	30%	0.647	0.584	0.064	0.573	0.828	0.886
6	M06	E_M06, E^LUMO_M06	30%	0.695	0.636	0.060	0.617	0.775	0.836
7	M06-L	E_M06-L, E^LUMO_M06-L	30%	0.705	0.646	0.059	0.632	0.797	0.854
8	M06-2X	E_M06-2X, E^LUMO_M06-2X	30%	0.683	0.620	0.063	0.588	0.744	0.809

Electron-correlation (CORR) only
9	CORR (HFX + LYP)	E_{CORR(HFX+LYP)}, E^LUMO_{CORR(HFX+LYP)}	30%	0.717	0.655	0.062	0.642	0.779	0.787
10	CORR (BLYP)	E_CORR(BLYP), E^LUMO_CORR(BLYP)	30%	0.710	0.647	0.063	0.637	0.727	0.757
11	CORR (B3LYP)	E_CORR(B3LYP), E^LUMO_CORR(B3LYP)	30%	0.697	0.603	0.094	0.607	0.683	0.696
12	CORR (M06)	E_CORR(M06), E^LUMO_CORR(M06)	30%	0.643	0.546	0.097	0.535	0.636	0.744
13	CORR (M06-L)	E_CORR(M06-L), E^LUMO_CORR(M06-L)	30%	0.656	0.562	0.094	0.529	0.616	0.725
14	CORR (M06-2X)	E_CORR(M06-2X), E^LUMO_CORR(M06-2X)	30%	0.659	0.576	0.083	0.549	0.712	0.810

Furthermore, among the models based on the descriptors computed using X-only methods, the DFT/HFX computed HOMO energy model (entry 2 in Table 3) based on E_HFX and E^HOMO_HFX, outperforms rest of the models while modeling both the activities. Moreover, this model have a more generalized applicability domain and least scattering between the experimental and predicted activity as evident from the Williams plot and scatter plot represented in Fig. 2 for the TA100 mutagenicity. Apart from the HOMO energy based model, the model (entry 2 in Table 5 and ESI Table S7†) based on the DFT/HFX computed total energy and absolute electronegativity, is also found to be robust for both types of mutagenicity. Besides this, the electrophilicity index based model (entry 2 in Table 7) is also found to be reliable. The Williams and scatter plots of these best performing models developed with the X-only methods are provided in the ESI Fig. S2(A and B) and S3(A and B).† Further, for the TA100 activity, the model (entry 2 in Table 6) developed with the total energy and chemical hardness performs satisfactorily only when computed with the DFT/HFX method, whereas the external predictivity of the same model (entry 1 in Table 6) but developed with the descriptor computed using the DFT/B88 method is observed to be comparatively less reliable. However, among the TA98 models, both the X-only methods give robust parameters for the models based on the chemical hardness (see entries 1 and 2 in the ESI Table S8†), but it may be due to the small data-set used for the TA98 models as explained in the previous section.


	Fig. 2 (a) Williams plot: standardized residuals versus leverage (h), for the TA100 mutagenicity model based on the total energy and the energy of the HOMO computed using DFT/HFX method. The training and prediction set chemicals, represented with open (yellow) and filled (blue) circles, respectively, are obtained using 30% random splitting method. The encircled values represent the ID number of the compounds, provided in the ESI Table S1 (for other best models, see also ESI Fig. S2†). The vertical (solid) line indicates the warning leverage (h*). (b) Scatter plot: experimental versus predicted logTA100 mutagenicity using model as specified in (a) (for other best models, see also the ESI Fig. S3†).

Table 5 Same as Table 3 but for the models based on the total electronic energy (E) and absolute electronegativity (χ). For the TA98 mutagenicity, see ESI Table S7

Model s. no.	Method	Descriptor employed	Splitting employed	R²	Q_LOO²	R² − Q_LOO²	Q_LMO²	Q_F3²	CCC_EXT
Exchange (X) only
1	DFT/B88	E_B88, χ_B88	30%	0.716	0.640	0.076	0.621	0.686	0.831
2	DFT/HFX	E_HFX, χ_HFX	30%	0.796	0.735	0.062	0.718	0.762	0.877

Exchange + correlation (X + C)
3	HFX + LYP	E_HFX+LYP, χ_HFX+LYP	30%	0.734	0.681	0.052	0.662	0.845	0.907
4	BLYP	E_BLYP, χ_BLYP	30%	0.717	0.662	0.055	0.640	0.736	0.823
5	B3LYP	E_B3LYP, χ_B3LYP	30%	0.692	0.636	0.056	0.605	0.726	0.827
6	M06	E_M06, χ_M06	30%	0.704	0.645	0.059	0.624	0.756	0.836
7	M06-L	E_M06-L, χ_M06-L	30%	0.679	0.616	0.064	0.596	0.752	0.825
8	M06-2X	E_M06-2x, χ_M06-2X	30%	0.698	0.639	0.059	0.608	0.768	0.842

Electron-correlation (CORR) only
9	CORR (HFX + LYP)	E_{CORR(HFX+LYP)}, χ_{CORR(HFX+LYP)}	30%	0.723	0.660	0.063	0.661	0.800	0.818
10	CORR (BLYP)	E_CORR(BLYP), χ_CORR(BLYP)	30%	0.726	0.676	0.051	0.654	0.787	0.789
11	CORR (B3LYP)	E_CORR(B3LYP), χ_CORR(B3LYP)	30%	0.739	0.689	0.050	0.671	0.850	0.862
12	CORR (M06)	E_CORR(M06), χ_CORR(M06)	30%	0.724	0.662	0.062	0.622	0.787	0.883
13	CORR (M06-L)	E_CORR(M06-L), χ_CORR(M06-L)	30%	0.735	0.675	0.060	0.642	0.784	0.880
14	CORR (M06-2X)	E_CORR(M06-2X), χ_CORR(M06-2X)	30%	0.723	0.656	0.067	0.622	0.827	0.905

Table 6 Same as Table 3 but for the models based on the total energy (E) and chemical hardness (η). For the TA98 mutagenicity, see ESI Table S8

Model s. no.	Method	Descriptor employed	Splitting employed	R²	Q_LOO²	R² − Q_LOO²	Q_LMO²	Q_F3²	CCC_EXT
Exchange (X) only
1	DFT/B88	E_B88, η_B88	30%	0.538	0.391	0.148	0.377	0.630	0.746
2	DFT/HFX	E_HFX, η_HFX	30%	0.745	0.667	0.078	0.639	0.767	0.870

Exchange + correlation (X + C)
3	HFX + LYP	E_HFX+LYP, η_HFX+LYP	30%	0.771	0.719	0.051	0.697	0.869	0.920
4	BLYP	E_BLYP, η_BLYP	30%	0.774	0.721	0.053	0.709	0.136	0.503
5	B3LYP	E_B3LYP, η_B3LYP	30%	0.614	0.521	0.093	0.510	0.579	0.693
6	M06	E_M06, η_M06	30%	0.583	0.453	0.130	0.451	0.618	0.703
7	M06-L	E_M06-L, η_M06-L	30%	0.569	0.436	0.133	0.430	0.573	0.663
8	M06-2X	E_M06-2x, η_M06-2X	30%	0.588	0.462	0.126	0.459	0.631	0.714

Electron-correlation (CORR) only
9	CORR (HFX + LYP)	E_{CORR(HFX+LYP)}, η_{CORR(HFX+LYP)}	30%	0.715	0.658	0.056	0.625	0.794	0.796
10	CORR (BLYP)	E_CORR(BLYP), η_CORR(BLYP)	30%	0.723	0.676	0.048	0.596	0.771	0.771
11	CORR (B3LYP)	E_CORR(B3LYP), η_CORR(B3LYP)	30%	0.781	0.729	0.052	0.720	0.774	0.792
12	CORR (M06)	E_CORR(M06), η_CORR(M06)	30%	0.725	0.663	0.062	0.626	0.785	0.882
13	CORR (M06-L)	E_CORR(M06-L), η_CORR(M06-L)	30%	0.736	0.675	0.060	0.648	0.789	0.883
14	CORR (M06-2X)	E_CORR(M06-2X), η_CORR(M06-2X)	30%	0.723	0.656	0.067	0.626	0.826	0.905

Table 7 Same as Table 3 but for the models based on the total electronic energy (E) and electrophilicity index (ω). For the TA98 mutagenicity, see ESI Table S9

Model s. no.	Method	Descriptor employed	Splitting employed	R²	Q_LOO²	R² − Q_LOO²	Q_LMO²	Q_F3²	CCC_EXT
Exchange (X) only
1	DFT/B88	E_B88, ω_B88	30%	0.643	0.472	0.171	0.477	0.690	0.804
2	DFT/HFX	E_HFX, ω_HFX	30%	0.778	0.720	0.059	0.696	0.723	0.854

Exchange + correlation (X + C)
3	HFX + LYP	E_HFX+LYP, ω_HFX+LYP	30%	0.688	0.631	0.057	0.606	0.827	0.891
4	BLYP	E_BLYP, ω_BLYP	30%	0.595	0.526	0.069	0.499	0.812	0.880
5	B3LYP	E_B3LYP, ω_B3LYP	30%	0.611	0.541	0.070	0.513	0.806	0.863
6	M06	E_M06, ω_M06	30%	0.683	0.620	0.063	0.584	0.690	0.771
7	M06-L	E_M06-L, ω_M06-L	30%	0.658	0.589	0.069	0.546	0.602	0.707
8	M06-2X	E_M06-2X, ω_M06-2X	30%	0.702	0.644	0.057	0.619	0.762	0.795

Electron-correlation (CORR) only
9	CORR (HFX + LYP)	E_{(CORR,HFX+LYP)}, ω_{(CORR,HFX+LYP)}	30%	0.727	0.663	0.064	0.630	0.595	0.791
10	CORR (BLYP)	E_CORR(BLYP), ω_CORR(BLYP)	30%	0.768	0.695	0.074	0.639	0.632	0.716
11	CORR (B3LYP)	E_CORR(B3LYP), ω_CORR(B3LYP)	30%	0.772	0.730	0.042	0.706	0.898	0.912
12	CORR (M06)	E_CORR(M06), ω_CORR(M06)	30%	0.726	0.661	0.065	0.629	0.783	0.882
13	CORR (M06-L)	E_CORR(M06-L), ω_CORR(M06-L)	30%	0.726	0.661	0.065	0.641	0.790	0.882
14	CORR (M06-2X)	E_CORR(M06-2X), ω_CORR(M06-2X)	30%	0.709	0.640	0.070	0.606	0.815	0.896

It should be noted that the HF and DFT/HFX methods employing 100% HF exchange without any correlation functional are expected to yield exactly the same models since the expression for the energy in the two methods is exactly the same. However, in the present work, for a few molecules, the energy and other molecular descriptors computed using the two methods differs slightly as evident in the ESI Table S2a.† This difference is mainly due to the different algorithms used for the HF and DFT codes in the computational software. Furthermore, as evident in Tables 3–7 (entry 2) for the DFT/HFX method and in the ESI Table S10† for the HF method, though the model's parameters based on the same descriptors computed using the two methods differ, however, the overall reliability and predictivity of the models do not vary significantly.

The statistical performance of the various QSAR models developed with the descriptors computed using the X-only methods suggests that the quantum-mechanical exchange interactions between the electrons can be highly significant in the modeling of the mutagenic activities as evident in the present study on the nitrated-PAHs. It should, however, be noted that the descriptors computed through the X-only methods also includes the effect of kinetic motion of electrons, electron-nuclear Coulombic attraction etc. For example, the total energy computed using the X-only methods is the sum of the kinetic energy of electrons, potential energy due to electron–nuclear attraction, and the exchange energy due to quantum-mechanical interactions between the electrons of parallel spin, but neglecting some instantaneous electron–electron interactions which can also be highly significant as described below.

Exchange–correlation (X + C) methods

The aforementioned X-only methods do not accounts for a significant part of the dynamic electron correlations, namely, the Coulomb correlation arising from the instantaneous electrostatic interaction between a pair of electrons.^1,2 For example, the HF method allows the two electrons to come closer to each other than they actually are, since it does not effectively includes the Coulomb correlation. On the other hand, the DFT accounts for the exchange as well as correlation interactions between the electrons via an XC functional. In the present study, QSAR models are also developed using the descriptors computed through the DFT while including a correlation functional (besides a standalone exchange functional) as in HFX + LYP and BLYP, which includes the LYP correlation functional, respectively, with an exact HF exchange and Becke (B88) exchange. Besides these, the hybrid XC functionals such as B3LYP and meta functionals (M06, M06-L, M06-2X) were also employed to investigate the role of both exchange and correlation in the external predictivity of the quantitative models for the mutagenicity.

Comparing the statistical validation parameters in entries 3–8 of Tables 3–7 and the errors in the prediction (depicted in Fig. 1 and ESI Fig. S1†) for various QSAR models developed through the molecular descriptors computed with different X + C methods, it is clearly evident that for the TA100 mutagenicity, the HFX + LYP based models are the most robust, except in the case of model (entry 3 in Table 4) based on the LUMO energy. However, the model (entry 3 in Table 6) developed with the total energy and chemical hardness computed with HFX + LYP method outperforms all other models for the TA100 mutagenicity. Besides this, the statistical stability and reliability of the model (entry 3 in Table 3) based on the total energy and HOMO energy is also found to be comparable to that of the model based on the total energy and chemical hardness. In fact, for the TA98 mutagenicity, the HOMO energy based model (entry 3 in the ESI Table S5†) is also found to be highly reliable and predictive. The Williams plot and scatter plot of these reliable models observed are further provided in the ESI Fig. S2(A and B) and S3(A and B).†

However, the reliability of the models developed with the descriptors computed using new-generation meta XC functionals (M06, M06-L, M06-2X) is found to be less than that of the models developed using the widely used B3LYP functional as evident from the statistical parameters listed in Tables 3–7 (entries 5–8). In fact, for the TA100 mutagenicity, these functionals show low validation parameters in most of the models, though in case of the models based on the absolute electronegativity and electrophilicity index (entries 6–8 in Tables 5 and 7), reliable internal validation parameters are observed, however, these models are still less predictive as indicated by the external validation parameter (CCC_EXT). Similar observations are also made in the case of TA98 mutagenicity. Further, among all the models developed with the LUMO energy computed using the X + C methods, only the BLYP and M06-L functionals show statistically reliable parameters. However, the TA100 mutagenicity model (entry 4 in Table 6) based on the chemical hardness computed using the BLYP functional is found to be unreliable.

X-only versus X + C methods

Comparing the overall reliability of the QSAR models based on the descriptors computed through the X-only methods with those developed using the same descriptors but computed through the X + C methods, it is found that the inclusion of the electron-correlation though increases the external predictivity of most of the models, however, it does not significantly improve the internal stability of the models, rather, in some models it decreases the robustness of the models. For example, for the model based on the total energy and energy of the LUMO, the inclusion of LYP correlation to the HFX or Becke exchange though leads to more reliable external validation parameters (Q_F3², CCC_EXT), but less reliable internal validation parameters (R² and Q_LOO²). However, this trend is observed to be model dependent and is found to be opposite for the two exchange functionals, HFX and B88, for most of the models analyzed in the present study. Notably, the error in prediction (MAE_EXT and RMSE_EXT) reduces significantly when a correlation functional is included as can be seen for most of the models analyzed in Fig. 1.

Overall, the effect of the electron-correlation along with the exchange, including that from the hybrid XC functionals, seems to increase the external prediction ability of the models but at the same time decreasing the internal stability in a few models. Therefore, it would be interesting to see if the exclusion of some exchange while retaining mainly the effect of the electron-correlation can increase the internal stability of the models, as analyzed below.

Correlation (CORR) only methods

As remarked in the introductory section, the electron-correlation based descriptors computed using the CORR method, employing the B3LYP hybrid XC functional, are found to be highly reliable while modeling the mutagenicity as evident in our previous studies.^3,4 In the present work, we further analyze the quality of the QSAR models developed with the electron-correlation based descriptors computed through eqn (1)–(3) while employing different XC functionals such as HFX + LYP, BLYP, B3LYP, M06, M06-L, and M06-2X functionals. It is quite evident from the entries 9–14 in Tables 3–7 and Fig. 1 that most of the models developed using only the electron-correlation based descriptors, in particular computed using the hybrid XC functionals, not only exhibit robust internal and external validation but also have quite low errors in the prediction. For example, the CORR (B3LYP) is observed to be outperforming in most of the models except in the case of model (entry 11 in Table 4) based on the energy of the LUMO. In fact, using the electron-correlation based descriptors computed through the B3LYP, the models (entry 11 in Tables 3 and 7) based on the HOMO energy and electrophilicity index along with the electron-correlation energy are found to be the most robust. In fact, the electrophilicity index based model with descriptors, E_CORR(B3LYP) and ω_CORR(B3LYP), is found to be highly predictive as indicated by the robust value for the CCC_EXT parameter. The Williams and scatter plot for this model are further represented in Fig. 3.


	Fig. 3 Same as Fig. 2, but for the TA100 mutagenicity model based on E_CORR(B3LYP) and ω_CORR(B3LYP) descriptors, incorporating mainly the effect of electron-correlation (CORR) in the total-energy (E) and the electrophilicity index (ω), computed using the DFT employing B3LYP hybrid XC functional (see also ESI Fig. S2 and S3†).

Besides this, the electron-correlation in the new-generation XC functionals such as M06, M06-L and M06-2X, is also observed to yield highly reliable models, particularly the models (entries 12–14 in Tables 3, 5 and 6) based on the HOMO energy, absolute electronegativity and chemical hardness, are found to be quite robust. Similar trend is observed in the case of TA98 mutagenicity, where the models developed with the CORR descriptors computed using these functionals show excellent internal as well as external reliability as also evident from the Williams plot and scatter plots depicted in the ESI Fig. S2(A and B) and S3(A and B).† Further, the models based on the electron-correlation contribution from the pure XC functionals, HFX + LYP and BLYP, are also found to be reliable though less predictive than those developed using the hybrid XC functionals. However, for the model based on the LUMO energy (entry 9 in Table 4), the CORR (HFX + LYP) is found to be the most reliable method. In fact, the models developed using the descriptor incorporating the electron-correlation through HFX + LYP are observed to be statistically more robust than those developed using the BLYP, clearly indicating the importance of the exact HF exchange.

Furthermore, as evident from the ESI Tables S5–S9,† in the case of TA98 mutagenicity, some of the models shows negative value for the internal and external validation parameters which can be mainly attributed to the very small data set employed in this work compared to a more reliable data-set used in our previous study,³ where such models based on the descriptors computed using the B3LYP are also found to highly reliable.

CORR only versus X + C methods

From the aforementioned discussion on the quality of the models developed using the X + C and CORR-only methods, interesting trends are observed. For example, in the case of TA100 models (entries 12–14 in Tables 5–7) based on the absolute electronegativity, chemical hardness and electrophilicity index, computed using the M06, M06-L and M06-2X meta XC functionals, the models are found to be highly robust and predictive when mainly the effect of the electron-correlation is included in the descriptor. Similar trend is observed in the case of TA98 mutagenicity, where the models developed with the CORR descriptors computed using meta XC functionals show excellent internal as well as external reliability.

Further, as evident from the entries 5 and 11 in Tables 3–7 for the models based on the widely used XC functional B3LYP, it is observed that the models are more robust and predictive when mainly the effect of electron-correlation is included in the descriptor, suggesting the electron-correlation interactions to be significant while developing the externally predictive quantitative models. However, this is not always the case as evident from the models (entries 3 and 9 in Table 3) based on the HOMO energy computed through the HFX + LYP, where mainly retaining the effect of LYP correlation did not seem to improve the quality of the model (entry 9 in Table 3). Moreover, as discussed previously, the HOMO energy based model (entry 2 in Table 3) has more reliable internal validation when only the exact HF exchange (HFX) is included without any correlation, whereas the inclusion of LYP correlation to the HFX is observed to decrease the internal stability of the HOMO energy based model as evident in entry 3 of Table 3.

Overall role of the exchange and correlation

From the above discussions, it is clear that the descriptors computed using the purely exchange (X-only) methods, and the CORR descriptors incorporating mainly the effect of electron-correlation are most reliable for predicting the mutagenic potential of nitrated-PAHs. Interestingly, the X-only methods, like DFT/HFX, having the exact HF exchange, are observed to perform satisfactorily for majority of the models. Among the CORR methods, the performance of the B3LYP is remarkable even though the new-generation meta XC functionals are also observed to be highly reliable. Though it is notable that the electron-correlation based descriptors computed using the hybrid XC functionals through eqn (1) do include some exchange interactions. However, most of the models based on the CORR descriptors computed using the pure XC functional such as HFX + LYP and BLYP underperforms, except for the external predictivity, compared to those developed using the corresponding X-only methods (DFT/HFX and DFT/B88). Therefore, the quantum-mechanical exchange may be quite critical, particularly for the internal stability of the models.

Further, for the models based on the HOMO energy, all the X-only as well as CORR methods are found to provide robust internal and external validation parameters, suggesting this descriptor to be an elite choice for modeling the mutagenic potential of compounds as had also been observed in our recent studies^3,4 which though employ different composition of the training and prediction sets. However, among the models based on the LUMO energy, the methods without the HF exchange, that is, B88 and M06-L are observed to be the most reliable. From the robustness and reliability of the models developed in our present and previous studies,^3,4 it is evident that the models based on the descriptors incorporating mainly the effect of electron-correlation, particularly from the total energy, energy of the HOMO, and electrophilicity index, can be highly reliable for developing externally predictive QSAR models for the TA100 and TA98 mutagenicity, irrespective of the data set distribution. Furthermore, from the Williams plots and scatter plots, depicted in the ESI Fig. S2(A and B) and S3(A and B),† it is evident that the models developed with the X-only and with the CORR descriptors have a generalized domain of applicability, and the least scattering between the predicted and experimental activity, suggesting these methods to be highly reliable. The reliability of the electron-correlation based descriptors is also evident from the quality of the consensus models listed in Table 8, which were proposed using the best models observed in the present study. It should be noted that a consensus model incorporates various molecular aspects of the compounds through different descriptors.

Table 8 Same as Table 3 but for the consensus models based on the best descriptors and the methods observed in the Tables 3–7 for the TA100 mutagenicity, and in the ESI Tables S5–S9 for the TA98 mutagenicity^a

Consensus model s. no.	Descriptor employed	Splitting employed	R_WCM²	RMSE_TR	Q_F3²	CCC_EXT	RMSE_EXT
a R_WCM² represent the coefficient of determination obtained with weight consensus model strategy.
TA100 mutagenicity
1	E_HFX, E^HOMO_HFX, χ_HFX, ω_HFX	30%	0.805	0.769	0.774	0.880	0.818
2	E_HFX+LYP, E^HOMO_HFX+LYP, χ_HFX+LYP, η_HFX+LYP	30%	0.793	0.836	0.862	0.917	0.644
3	E_M06-2X, χ_M06-2X, ω_M06-2X	30%	0.747	0.912	0.810	0.861	0.750
4	E_CORR(B3LYP), E^HOMO_CORR(B3LYP), χ_CORR(B3LYP), ω_CORR(B3LYP)	30%	0.792	0.832	0.886	0.898	0.605
5	E_CORR(M06-2X), E^HOMO_CORR(M06-2X), ω_CORR(M06-2X)	30%	0.806	0.878	0.844	0.914	0.681

TA98 mutagenicity
6	E_HFX, E^HOMO_HFX, χ_HFX, ω_HFX	30%	0.959	0.474	0.929	0.965	0.546
7	E_HFX+LYP, E^HOMO_HFX+LYP, χ_HFX+LYP, η_HFX+LYP	30%	0.885	0.466	0.968	0.989	0.348
8	E_CORR(B3LYP), E^HOMO_CORR(B3LYP), ω_CORR(B3LYP)	Activity sampling	0.968	0.330	0.957	0.975	0.450
9	E_CORR(M06-L), E^HOMO_CORR(M06-L), ω_CORR(M06-L)	Activity sampling	0.936	0.529	0.959	0.980	0.403

Conclusions

Through the quantum-chemical molecular descriptors computed using the HF method and the widely used XC functionals of the DFT, the present work had analyzed the role of quantum-mechanical exchange and electron-correlation in the external predictivity of the QSAR models developed for the TA100 and TA98 mutagenic activity of nitrated-PAHs. From the internal stability and the external predictivity of the models, following conclusions can be arrived at regarding the role of exchange and electron-correlation, and for the performance of various XC functionals of the DFT:

(1) In modeling of the mutagenicity, the descriptors computed using the X-only methods such as DFT/HFX, incorporating an exact HF exchange, are observed to be highly reliable for the models based on the total energy, HOMO energy, absolute electronegativity and electrophilicity index, though the X + C methods of DFT employing the XC functionals such as HFX + LYP and B3LYP performs satisfactorily, while the BLYP and meta XC functionals are also observed to be reliable for the models based on the LUMO energy.

(2) Surprisingly, the external predictivity of the models increases when mainly the effect of the electron-correlation is included in the descriptors particularly when computed through the CORR (B3LYP), CORR (M06), CORR (M06-L), and CORR (M06-2X) methods.

(3) The amount of quantum-mechanical exchange interactions is found to be critical along with the electron-correlation since retaining the latter decreases the internal stability of a few models as observed in the models developed using the CORR (HFX + LYP) and CORR (BLYP) methods.

(4) Notably, the models based on the descriptors incorporating mainly the effect of electron-correlation from the hybrid XC functionals such as B3LYP, and new-generation meta XC functionals like M06, M06-L, M06-2X, are observed to be highly reliable.

From the above conclusions, it may be suggested that the dynamic electron–electron interactions, namely, the quantum-mechanical exchange and correlation, can be highly significant in the reliability and external predictivity of the QSAR models while modeling the biological activities.

Acknowledgements

The authors thank University Grant Commission (UGC), India for financial support under the UGC-Major Research Project no. 42-313/2013(SR). Reenu also thanks UGC for UGC-BSR fellowship. The authors are grateful to Prof. Paola Gramatica for providing QSARINS software, and also to the Department of Chemistry, Panjab University, Chandigarh, India for providing other computational software and resources.

References

S. Wilson, Electron Correlation in Molecules, Clarendon Press, Oxford, 1984 Search PubMed.
P.-O. Löwdin, Int. J. Quantum Chem., 1995, 55, 77 CrossRef.
Vikas, Reenu and Chayawan, J. Mol. Graphics Modell., 2013, 42, 7 CrossRef CAS PubMed.
Reenu and Vikas, Ecotoxicol. Environ. Saf., 2014, 101, 42 CrossRef CAS PubMed.
Vikas and Chayawan, Chemosphere, 2014, 95, 448 CrossRef CAS PubMed.
Vikas and Chayawan, Chemosphere, 2015, 118, 239 CrossRef CAS PubMed.
A. Szabo and N. S. Ostlund, Modern Quantum Chemistry: Introduction to advanced Electronic Strucutre Theory, MacMillian, New York, 1982 Search PubMed.
E. G. Lewar, Computational Chemistry: Introduction to the Theory and Applications of Molecular and Quantum Mechanics, Springer, Heidelberg, 2nd edn, 2011 Search PubMed.
R. G. Parr and W. Yang, Density Functional Theory of Atoms and Molecules, Oxford University Press, New York, 1989 Search PubMed.
R. Veerasamy, H. Rajak, A. Jain, S. Sivadasan, C. P. Varghese and R. K. Agrawal, Int. J. Drug Des. Discovery, 2011, 2, 511 CAS.
R. Guha and P. C. Jurs, J. Chem. Inf. Model., 2005, 45, 65 CrossRef CAS PubMed.
A. D. Becke, J. Chem. Phys., 1993, 98, 1372 CrossRef CAS PubMed.
C. Lee, W. Yang and R. G. Parr, Phys. Rev. B: Condens. Matter Mater. Phys., 1988, 37, 785 CrossRef CAS.
L. Goerigk and S. Grimme, Phys. Chem. Chem. Phys., 2011, 13, 6670 RSC.
Y. Zhao and D. G. Truhlar, Acc. Chem. Res., 2008, 41, 157 CrossRef CAS PubMed.
L. A. Burns, A. Vazquez-Mayagoitia, B. G. Sumpter and C. D. Sherrill, J. Chem. Phys., 2011, 134, 084107 CrossRef PubMed.
J. P. Perdew, K. Burke and M. Ernzerhof, Phys. Rev. Lett., 1996, 77, 3865 CrossRef CAS.
A. Vela, J. C. Pacheco-Kato, J. L. Gázquez, J. M. del Campo and S. B. Trickey, J. Chem. Phys., 2012, 136, 144115 CrossRef PubMed.
J. P. Perdew, A. Ruzsinszky, G. I. Csonka, O. A. Vydrov, G. E. Scuseria, L. A. Constantin, X. Zhou and K. Burke, Phys. Rev. Lett., 2008, 100, 136406 CrossRef.
E. Fabiano, L. A. Constantin and F. Della Sala, Phys. Rev. B: Condens. Matter Mater. Phys., 2010, 82, 113104 CrossRef.
L. Chiodo, L. A. Constantin, E. Fabiano and F. Della Sala, Phys. Rev. Lett., 2012, 108, 126402 CrossRef.
P. Elliott, D. Lee, A. Cangi and K. Burke, Phys. Rev. Lett., 2008, 100, 256406 CrossRef.
J. M. D. del Campo, J. L. Gázquez, S. B. Trickey and A. Vela, Chem. Phys. Lett., 2012, 543, 179 CrossRef CAS PubMed.
S. Luo, Y. Zhao and D. G. Truhlar, J. Phys. Chem. Lett., 2012, 3, 2975 CrossRef CAS.
V. B. Oyeyemi, J. A. Keith, M. Pavone and E. A. Carter, J. Phys. Chem. Lett., 2012, 3, 289 CrossRef CAS.
N. Mardirossian, D. S. Lambrecht, L. McCaslin and S. S. Xantheas, J. Chem. Theory Comput., 2013, 9, 1368 CrossRef CAS.
R. Vijayaraj, V. Subramanian and P. K. Chattaraj, J. Chem. Theory Comput., 2009, 5, 2744 CrossRef CAS.
G. Schüürmann, in Predicting Chemical Toxicity and Fate, ed. M. T. D. Cronin and D. J. Livingstone, CRC Press, Taylor and Francis Group, Boca Raton FL, 2004, pp. 85–149 Search PubMed.
P. Geerlings, F. D. Proft and W. Langenaeker, Chem. Rev., 2003, 103, 1793 CrossRef CAS PubMed.
R. G. Parr, L. Szentpály and S. Liu, J. Am. Chem. Soc., 1999, 121, 1922 CrossRef CAS.
A. D. Becke, Phys. Rev. A, 1988, 38, 3098 CrossRef CAS.
A. D. Becke, J. Chem. Phys., 1993, 98, 5648 CrossRef CAS PubMed.
P. J. Stephens, F. J. Devlin, C. F. Chabalowski and M. J. Frisch, J. Phys. Chem., 1994, 98, 11623 CrossRef CAS.
Y. Zhao and D. G. Truhlar, J. Chem. Phys., 2006, 125, 194101 CrossRef PubMed.
Y. Zhao and D. G. Truhlar, Theor. Chem. Acc., 2008, 120, 215 CrossRef CAS.
Vikas and P. Sangwan, J. Phys. Org. Chem., 2014, 27, 565 CrossRef.
C. D. Sherrill and H. F. Schaefer III, Adv. Quantum Chem., 1999, 34, 143 CrossRef CAS.
M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. E. Scuseria, M. A. Robb, J. R. Cheeseman, J. A. Montgomery Jr, T. Vreven, K. N. Kudin, J. C. Burant, J. M. Millam, S. S. Iyengar, J. Tomasi, V. Barone, B. Mennucci, M. Cossi, G. Scalmani, N. Rega, G. A. Petersson, H. Nakatsuji, M. Hada, M. Ehara, K. Toyota, R. Fukuda, J. Hasegawa, M. Ishida, T. Nakajima, Y. Honda, O. Kitao, H. Nakai, M. Klene, X. Li, J. E. Knox, H. P. Hratchian, J. B. Cross, V. Bakken, C. Adamo, J. Jaramillo, R. Gomperts, R. E. Stratmann, O. Yazyev, A. J. Austin, R. Cammi, C. Pomelli, J. W. Ochterski, P. Y. Ayala, K. Morokuma, G. A. Voth, P. Salvador, J. J. Dannenberg, V. G. Zakrzewski, S. Dapprich, A. D. Daniels, M. C. Strain, O. Farkas, D. K. Malick, A. D. Rabuck, K. Raghavachari, J. B. Foresman, J. V. Ortiz, Q. Cui, A. G. Baboul, S. Clifford, J. Cioslowski, B. B. Stefanov, G. Liu, A. Liashenko, P. Piskorz, I. Komaromi, R. L. Martin, D. J. Fox, T. Keith, M. A. Al-Laham, C. Y. Peng, A. Nanayakkara, M. Challacombe, P. M. W. Gill, B. Johnson, W. Chen, M. W. Wong, C. Gonzalez and J. A. Pople, Gaussian 03, Revision D.01, Gaussian, Inc., Wallingford CT, 2004 Search PubMed.
F. Neese, Wiley Interdiscip. Rev.: Comput. Mol. Sci., 2012, 2, 73 CrossRef CAS.
P. Gramatica, P. Pilutti and E. Papa, SAR QSAR Environ. Res., 2007, 18, 169 CrossRef CAS PubMed.
A. K. Debnath, R. L. L. de Compadre, G. Debnath, A. J. Shusterman and C. Hansch, J. Med. Chem., 1991, 34, 786 CrossRef CAS.
H. Braat, M. P. Peppelenbosch and D. W. Hommes, Ann. N. Y. Acad. Sci., 2006, 1072, 135 CrossRef CAS PubMed.
R. Lewis, Human genetics: concepts and application, McGraw-Hill Education, New York, 11th edn, 1997 Search PubMed.
A. Tropsha, P. Gramatica and V. Gombar, QSAR Comb. Sci., 2003, 22, 69 CAS.
L. M. Shi, H. Fang, W. Tong, J. Wu, R. Perkins, R. M. Blair, W. S. Branham, S. L. Dial, C. L. Moland and D. M. Sheehan, J. Chem. Inf. Comput. Sci., 2001, 41, 186 CrossRef CAS.
G. Schüürmann, R. Ebert, J. Chen, B. Wang and R. Kühne, J. Chem. Inf. Model., 2008, 48, 2140 CrossRef PubMed.
V. Consonni, D. Ballabio and R. Todeschini, J. Chem. Inf. Model., 2009, 49, 1669 CrossRef CAS PubMed.
N. Chirico and P. Gramatica, J. Chem. Inf. Model., 2011, 21, 2320 CrossRef PubMed.
L. I. A. Lin, Biometrics, 1989, 45, 255 CrossRef CAS.
P. K. Ojha, I. Mitra, R. N. Das and K. Roy, Chemom. Intell. Lab. Syst., 2011, 107, 194 CrossRef CAS PubMed.
N. Chirico and P. Gramatica, J. Chem. Inf. Model., 2012, 52, 2044 CrossRef CAS PubMed.
O. A. Aptula, N. G. Jeliazkova, T. W. Schultz and M. T. D. Cronin, QSAR Comb. Sci., 2005, 24, 385 Search PubMed.
R. Todeschini, A. Maiocchi and V. Consonni, Chemom. Intell. Lab. Syst., 1999, 46, 13 CrossRef CAS.
N. Chirico, E. Papa, S. Kovarich, S. Cassani and P. Gramatica, QSAR Res. Unit in Environ. Chem. and Ecotox., University of Insubria, Varese, Italy, 2012, http://www.qsar.it Search PubMed.
P. Gramatica, N. Chirico, E. Papa, S. Cassani and S. Kovarich, J. Comput. Chem., 2013, 34, 2121 CrossRef CAS.
P. Gramatica, QSAR Comb. Sci., 2007, 26, 694 CAS.

Footnote

† Electronic supplementary information (ESI) available: Tables S1, S2a–f, S3–S30 and Fig. S1, S2(A and B) and S3(A and B). See DOI: 10.1039/c4ra14262d

Click here to see how this site uses Cookies. View our privacy policy here.