Scaffold and cell line based approaches for QSAR studies on anticancer agents

Shruti Satbhaiya* and O. P. Chourasia
Heterocyclic Research Laboratory, Department of Chemistry, Dr Hari Singh Gour Vishwavidyalaya, Sagar, M.P. 470003, India. E-mail: opcchem@gmail.com; satbhaiyashruti@gmail.com; Tel: +91-9009509089, +91-9098999717

Received 7th September 2015 , Accepted 24th September 2015

First published on 25th September 2015


Abstract

Based on a linear heuristic method, a quantitative structure–activity relationship was developed for the prediction of available in vitro anticancer activity. Each type of compound was represented by several calculated structural descriptors. Most of the computational studies were carried out targeting an insufficient number of cell lines. Predictive models were built for 482 compounds with experimental data against 30 different cancer cell lines. Strong statistical analysis showed a high correlation, cross validation coefficient values and provided a range of QSAR equations. Quantum chemical descriptors were found in 42 out of 46 models, electrostatic in 16, topological in 12, geometrical in 7, thermodynamic in 5 and constitutional in 7. It is interesting to note that in most cases, three descriptor-based models were relevant. Pancreatic cancer cell lines showed the best statistical values (average R2 = 0.87) followed by leukaemia (average R2 = 0.86).


1. Introduction

Cancer is a multifactorial disease of striking significance in the world today and ranks high among human diseases. It has become the second leading cause of death among the human population1,2 after cardiovascular diseases. Therefore the development of potent and precise anticancer agents is urgently needed3 and still a major challenge to medicinal chemistry research. Researchers have given attention towards the discovery of novel anticancer agents due to a lack of an extensive range of anticancer drugs to take advantage of new discoveries concerning tumour genesis. Tied with the exclusive growth pattern of various repertoires of cancer4 but due to acquisition by cancer cells of multiple-drug resistance, current anticancer chemotherapy still suffers. Due to a vast increase in the number of feasible molecular targets, the focus has shifted from target identification to target validation.5

The main sources of lead compounds for drug development are natural products because of their intrinsic biorelevant presence of small hetero-aromatic compounds. They have shown unexpected biological properties and became the basis for a whole number of innovative medicinal agents.6 The collection of these compounds is dramatically higher than those resulting from high throughput screens of combinatorial libraries.7–9 Preparation of libraries based on natural products requires sophisticated and laborious synthetic sequences. In addition, therapeutic development of promising leads resulting from these libraries is significantly impeded by the problem of large-scale compound supply. Because of the improved interest in natural products by the failure of alternative methods to provide many therapeutic lead compounds and by the pharmaceutical industry, these challenges are becoming increasingly more pertinent.10

The pharmaceutical industry has to ensure the safety, quality, and efficacy of a marketed drug by subjecting the drug to a range of analysis.11 To acquire a complete product of drug discovery it takes long time approximately 12 years12 and the projected cost for a marketed drug is high.13 This expensive and lengthy process may cause failure of drug development. Thus, it will be useful to predict these failures prior to the clinical stage in order to reduce drug development costs.14 To filter out potential failures during the drug development stage, various methods such as in vitro, in vivo or in silico methods are being used. Quantitative structure–activity relationship (QSAR) modeling is an example of an in silico method, which can be used to understand drug action, design new compounds, and screen chemical libraries.15–18 Combinatorial approaches are an influential tool in selection to speed up drug discovery and with different mechanisms of action, this method is being adopted to cure cancer.19,20 QSAR has become crucial for the molecular interpretation of biological properties.21–26 This technique is the most important tool used in analogue-based drug design and has been broadly used for the calculation of assorted properties like carcinogenicity,27 ADME,28 stability,29 toxicity,30,31 retention time32 and other physicochemical properties apart from the biological activity.33–36 The QSAR method makes possible the theoretical prediction of structures with desired property values by combining the QSAR method with pattern recognition techniques. In lead optimization, development of QSAR models using various physicochemical descriptors has been a vital task.37 The use of such multiple QSAR models to derive a mechanistic approach can be illustrated by a comparison of the experimental data available on the anticancer agents. Computational methods also aid the rapid generation of new hypotheses, as well as the design and interpretation of hypothesis-driven experiments in the field of cancer research.

A number of quantum chemical descriptors (such as molecular orbital, charge and dipole moments, etc.), electrostatic descriptors (such as charge-based descriptors etc.), geometrical descriptors (such as moment of inertia etc.) and thermodynamic descriptors (such as entropy and vibrational frequency etc.) have been effectively applied to set up QSAR models for predicting activities of compounds.38–40 There are a large number of cell lines available for any given cancer type, on which in vitro biological activity can be executed, but the results of this prediction differ based on the cell line used for the assay. As a result it becomes complicated for computational chemists to select experimental data from a pool of existing biological activity for a single scaffold type. In vitro experimental data for anticancer activity is available against many different cell lines. In the literature, QSAR studies are carried out mainly for any one particular cell line, which may not be a good approach. This study considered all the available experimental data for many different cell lines to build predictive models, and will aid medicinal chemists to more reliably design new and potent compounds. Analyses of the obtained descriptors for models against all the cell lines, may suggest the significance of a particular type of descriptor in modeling anticancer activity against a cancer type.

2. Computational methods

2.1. Data set for analysis

Reported in vitro assays for 16 different scaffolds against 30 cell lines for a total of 482 compounds were considered for the present investigation (Tables S1–S16 in the ESI). The inhibitor activities (IC50) against different cell lines were converted to pIC50 according to the formula pIC50 = −log(IC50). The parent structures of all the scaffolds with a number of compounds and the names of the cell lines are reported in Fig. 1. Table 1 represents the name of the scaffolds considered, the different cell lines and the number of molecules corresponding to the cell lines.20,41–54
image file: c5ra18295f-f1.tif
Fig. 1 482 compounds which have IC50 values represented different scaffolds (S1–S16). The number of compounds in each scaffold are in parenthesis and the different cell lines against which the cytotoxicity values were reported (please see Tables S1–S16 in additional file A for the structures of all the compounds with their in vitro IC50 values against various cell lines).
Table 1 Details of the scaffolds considered in this study and the cell lines against which their anticancer activity was reported, along with the number of molecules in each cell lines
S. no. Scaffold name Cell lines Cancer type No. of compound Ref.
S1 Acridine P388 Leukemia 41 41
LLc Lung 41
JLc Leukemia 41
S2 Cantharidine HT-29 Colon 35 42
SW480 Colon 35
MCF-7 Breast 35
A2780 Ovarian 35
H460 Lung 35
A431 Skin 35
DU145 Prostate 35
BE2-C Neuronal 35
SJ-G2 Brain 35
S3 Chalcone ACHN Renal 19 43
Pancc1 Pancreatic 19
Calu1 Lung 19
H460 Lung 19
HCT116 Colon 19
S4 Tetrahropyrimidine MCF-7 Breast 23 44
S5 Isatin HCT116 Colon 32 45
MCF-7 Breast 32
S6 Isoflavne HCT116 Colon 23 46
S7 Nitroalkene HeLa Cervical 22 47
S8 Phenazine H69 Lung 18 48
S9 Podophyllotoxin HeLa Cervical 30 49
MCF7 Breast 30
S10 Pyrazole HeLa Cervical 17 49
MCF-7 Breast 17
S11 Pyrazoline MCF-7 Breast 20 50
B16-F10 Melanoma 20
S12 Pyrimidine BEL-7402 Heptocellular 37 20
S13 Quinazoline MCF-7 Breast 36 51
U251 CNS 36
SW480 Colon 36
H522 Lung 36
M14 Melanoma 36
SKOV3 Ovarian 36
DU145 Prostate 36
A498 Renal 36
S14 Quinoxaline MCF-7 Breast 22 52
H460 Lung 22
SF-268 CNS 22
S15 Semicarbazide L120 Leukemia 30 53
S16 Stillbene A549 Lung 69 54
MCF-7 Breast 69
HT-29 Colon 69
SKMEL-5 Melanoma 69
MLM Melanoma 69


2.2. Optimization

A total of 482 compounds were collected along with their anticancer activity against 30 cancer cell lines, which belong to 16 different chemical scaffolds (Fig. 1). All the structures were initially optimized and their vibrational frequencies calculated using a semi-empirical AM1 procedure using AMPAC 5.0. Gaussian output files were obtained for each structure, which acted as input files for a CODESSA program for calculating descriptors as well regression analysis.

2.3. Calculating 2D descriptors and regression analysis

CODESSA (COmprehensive DEscriptors for Structural and Statistical Analysis) version 2.0 was used to calculate the 2D descriptors as well as for regression analysis.55 Fig. 2 provides a schematic illustration of the workflow accepted for the current study to develop and validate various QSAR models. Initially, approximately 540 default descriptors were calculated and these descriptors were further classified into following groups viz. constitutional, topological, geometrical, electrostatic, quantum-chemical and thermodynamic descriptors.37
image file: c5ra18295f-f2.tif
Fig. 2 Flowchart for the methodology accepted for the development and validation of the QSAR models.

Two different schemes were chosen to develop statistically significant QSAR models. In the first scheme, 16 QSAR models were developed for the 16 scaffolds used in this investigation (i.e. scaffold-based QSAR models), whereas in the second scheme, 30 different QSAR models were developed based on the availability of IC50 values against 30 cancer cell lines by combining all the scaffolds (i.e. cell line-based QSAR models). For all the models, intercorrelation of the descriptors was also tested. Then, models containing highly intercorrelated descriptors were replaced and refined so that the descriptors, which were employed in the given models, were practically orthogonal to each other.

Large numbers of descriptors will create confusion and reduce the predictive ability and statistical robustness of the model. So we scrupulously developed 3, 4 and 5 descriptor-based models for all the sets of compounds to find out the minimum number of descriptor defining activity, with the help of a heuristic method which is a multilinear regression method. This method is better than other methods due to its high speed. This method usually produces correlation 2–5 times faster than other methods with comparable quality and it has no restriction on the size of the data set. On comparison with four and five descriptor-based models, three descriptor-based models were found to be satisfactory for all sets of compounds. To assess the statistical quality of the models, various parameters like R2, Rcv2, AE, S2, F and t-test are essential, which were obtained from the correlation of approximately 540 descriptors (constitutional, geometrical, topological, electrostatic, thermodynamic and quantum chemical etc.) in different combinations.56 The R2 value is a relative measure of the quality of fit, F represents the F-ratio between the variance of the calculated and experimental activity, and the t-test reflects the significance of the parameter within the model. The effect of the number of descriptors on the correlation coefficient was examined on the set of molecules using a heuristic method at 1–10 descriptors.

3. Results & discussion

By using IC50 values as dependent variables and deliberated properties as independent variables, regression was executed for QSAR analysis of various developed models. It would be suitable to obtain insight into the physical meaning of the correlation obtained as an output of the regression analysis. To improve the anticancer activity of molecules, the magnitude of a descriptor could be used as a guideline.

Among the developed models, sixteen and thirty models were selected on the basis of several statistical and other parameters such as R2, Rcv2, S2, AE (average residual) values, Fischer’s values (F test) and a t-test. The relationship between the number of descriptors and the correlation values for all models were determined by correlating 1–10 descriptors individually, as shown in Fig. 4(a) and (b) for cell line-based models and scaffold-based models respectively. Among all the models, three descriptor models were acceptable for getting the best correlation because models with more than six descriptor may give high correlation values, which may be phony and may not be constructive for the further prediction of biological activities.


image file: c5ra18295f-f3.tif
Fig. 3 (a) Plots between experimental and predicted IC50 values for cell line-based QSAR models with correlation coefficients and cross validation coefficients for 15 high quality statistical models. (b) Plots between experimental and predicted IC50 values for scaffold-based QSAR models with correlation coefficients and cross validation coefficients for 8 high quality statistical models.

image file: c5ra18295f-f4.tif
Fig. 4 (a) The effect of the number of descriptors on the correlation coefficient on the basis of cell line-based QSAR models. (b) The effect of the number of descriptors on the correlation coefficient on the basis of scaffold-based QSAR models.

All the models were separated into training sets and test sets. Developed models, which were constructed using training set compounds, were used to determine the activity of test set compounds. Lower average residual values obtained from both the training and test set indicate which models have a high potential to establish the correlation between the structure and activity.

Most of the scaffold-based QSAR models along with regression equations, cancer type and the names of the cell lines are given in Table 2(a). We obtained superior statistically quality for most of the scaffold-based QSAR models with higher correlation coefficient values than the cell line-based models. The lower number of compounds is an important reason for the high correlation coefficients of these models. The range of activity of the compounds in three (S2, S10 and S12) models is poor. In comparison, models containing broad activity range compounds show high correlation coefficients, while narrow activity range compounds show lower correlation coefficient values. Besides these models, all the scaffold-based models with high correlation coefficient values seen rational and can be used for further prediction.

Table 2 (a) Cell line with the types of cancer in parenthesis, the scaffolds involved, regression summary (regression equation, correlation coefficient R2, cross validation coefficient Rcv2 and average residual) and the number of compounds (training set, TR, and test set, TS) in various scaffold-based QSAR models. (b) The cell line with the type of cancer in parenthesis, the scaffolds involved, regression summary (regression equation, correlation coefficient R2, cross validation coefficient Rcv2, average residual AE) and the number of compounds (training set, TR, test set, TS) in various cell line-based QSAR modelsa
No. Cell lines (type) Regression equation R2 Rcv2 AE F S2 # comp
TR TS
a R2 is the square of the correlation coefficient and represents the statistical significance of the model. Rcv2 is the cross-validated R2, a measure of the quality of the QSAR model. AE is the average of absolute difference between the experimental and calculated IC50 values. F is the Fischer statistic, the ratio between the explained and unexplained variance for a given number of degrees of freedom, thereby indicating a factual correlation or the significance level for the QSAR models. S2 is the standard deviation. TR is number of molecules in training set and TE is test set molecules.
S1 P388 (leukemia) =−6.2155 × VE/T + 2.3164 × WPSA3Q + 3.3250 × LNMVF + 1.5252 0.75 0.67 0.35 26.74 0.145 31 9
S2 HT29 (colon) =1.8444 × RNB − 3.3083 × MaenAC + 0.13180 × PMIA + 6.1097 0.69 0.55 0.12 15.36 0.033 27 8
S3 ACHN (renal) =7.4622 × FPSA3z − 7.8674 × WNSA2z − 1.0224 × BI + 1.8722 0.98 0.95 0.05 105.92 0.001 15 4
S4 MCF7 (breast) =−2.0431 × PPSA3z + 2.7466 × ZXS/ZXR − 2.0527 × RNCGQ + 3.5395 0.89 0.73 0.09 29.10 0.009 17 5
S5 HCT116 (colon) =4.1330 × RNCl − 2.1896 × RNCSz − 1.2796 × FNSA2Q − 2.7626 0.77 0.66 0.21 21.31 0.028 23 8
S6 HCT116 (colon) =4.0785 × EMiNACC − 4.1004 × PNSA2Q − 1.4298EHBCAQ + 2.0450 0.88 0.82 0.14 31.98 0.018 18 5
S7 HeLa (cervical) =−9.3376 × PMIB − 2.0744 × EHDSAQ + 1.7527 × EMaNACC + 3.8717 0.85 0.75 0.14 19.85 0.036 15 4
S8 H69 (lung) =4.7221 × MaPCHz − 2.5135 × TE/#A-t + 2.6179 × MiERIC 0.96 0.93 0.12 76.93 0.019 14 4
S9 HeLa (cervical) =2.1519 × MiERIN + 3.4050 × MaREHN − 1.0293 × ABOC − 3.4442 0.93 0.90 0.12 84.32 0.074 22 6
S10 HeLa (cervical) =9.0910 × MaenACH − 6.2862 × HNMVF + 1.2038 × MaPPBO − 1.5070 0.96 0.90 0.15 50.38 0.051 14 3
S11 B16-F10 (melanoma) =−2.7430 × 1XGP + 5.4109 × DPSA2z − 6.9408 × EHDSAQ + 3.9304 0.93 0.88 0.11 41.90 0.012 15 5
S12 BEL-7402 (melanoma) =2.98993 × HLEG + 1.0598 × MaeeRN + 2.9834 × KSI3 − 4.0099 0.70 0.60 0.35 20.86 0.106 32 11
S13 M14 (melanoma) =4.4562 × MiERIN − 3.6455 × MiAOEP + 1.0863 × MiBON (0.1) + 1.3577 0.65 0.56 0.31 15.05 0.215 28 8
S14 SF-268 (CNS) =−1.3737 × MaTICC + 1.2771 × EPNSA3Q − 1.2524 × MiTICN + 4.8510 0.74 0.59 0.46 11.22 0.240 17 5
S15 L120 (leukemia) =1.712 × RNN − 4.0400 × MiERIC + 9.1240 × MIA − 4.5014 0.92 0.89 0.24 71.94 0.055 22 8
S16 A549 (lung) =−1.2148 × MaenACC + 5.4537 × FNSA1Q − 6.7738 × AIC1 0.48 0.38 0.24 12.46 0.087 49 17

No. Cell line (type) Scf. Regression equation R2 Rcv2 AE F S2 # of comp.
TR TS
M1 A498 (renal) S13 =−3.3738 × MannRCN + −1.2453 × ASIC1 − 1.0807 × RNCSz + 4.6355 0.71 0.56 0.62 14.87 0.239 31 5
M2 A549 (lung) S16 =−1.3066 × MaenACC + 1.1275 × PNSA1Q − 7.9011 × PNSA2z + 3.3541 0.56 0.49 0.27 18.41 0.094 56 10
M3 A2780 (ovarian) S2 =5.3016 × MiNRIO + 6.1285 × HACA1Q − 1.222 × RPCSz − 3.6341 0.68 0.54 0.22 13.89 0.043 30 5
M4 ACHN (renal) S3 =7.4622 × FPSA3z − 7.8674 × WNSA2z − 1.0224 × BI + 1.8722 0.96 0.95 0.05 105.93 0.001 16 3
M5 A431 (skin) S2 =−7.2276 × YZS + 1.0111 × RI0 × +7.4112 × MiERIC + 2.5462 0.69 0.59 0.14 13.43 0.036 30 5
M6 B16-F10 (melanoma) S11 =−1.0251 × WPSA1Q − 1.7686 × MiTICS + 1.6039 × MiNACN + 2.7567 0.94 0.89 0.11 62.56 0.015 17 3
M7 BE2-C (neuronal) S2 =−5.0255 × XYS/XYR + 1.4971 × MiERIC − 6.7892 × RNO + 5.2368 0.72 0.63 0.16 16.67 0.030 29 7
M8 BEL-7402 (heptocellular) S12 =2.6386 × HLEG − 6.0993 × WNSA2Q − 4.8693 × MiERIN − 2.2423 0.58 0.44 0.31 12.34 0.177 35 10
M9 Calu1 (lung) S3 =2.2933 × A1ERIC − 2.3063 × SIC1 − 3.0798 × YZS + 5.2315 0.93 0.80 0.12 41.25 0.023 16 3
M10 DU145 (prostate) S3, S13 =7.0109 × MiTICN − 7.8249 × PNSA1z − 2.7692 × CHaSz − 6.9257 0.43 0.32 0.45 11.60 0.307 57 15
M11 H69 (lung) S8 =−1.1403 × FNSA2z − 5.8440 × ERPCGQ − 5.2628 × MienACH + 3.4678 0.93 0.81 0.13 44.62 0.033 15 3
M12 H522 (lung) S13 =1.4711 × MiPCCz × +2.3775 × MiBOC(0.1) + 8.7939 × THCMD − 2.1601 0.73 0.63 0.24 18.26 0.201 28 8
M13 HCT116 (colon) S3, S5, S6 =2.7531 × RE/T + 3.3081 × LNMVF + 64.6887 × ACIC1 − 1.0055 0.59 0.49 0.29 24.10 0.119 60 14
M14 HeLa (cervical) S7, S9, S10 =−6.4020 × MaBON − 9.5518 × MienACN + 7.3732NN + 3.7878 0.76 0.71 0.40 43.55 0.377 56 11
M15 HT29 (colon) S2, S16 =−6.4490 × FNSA2Q + 1.3955 × PP/SD + 9.1032 × MaenAC − 1.6224 0.30 0.22 0.27 9.54 0.098 81 20
M16 JLc (leukemia) S1 =−7.0103 × EMaNACH + 1.5761 × HDSA2Q − 1.0051 × MaeeRC + 1.0507 0.86 0.82 0.33 44.61 0.069 30 11
M17 L120 (leukemia) S15 =−4.8986 × HDCA2Q + 1.7065 × WNSA2z − 1.4993 × MienANN + 6.9159 0.90 0.84 0.17 43.50 0.066 25 6
M18 LLc (lung) S1 =−1.5310 × ZXS/ZXR + 2.7870 × ERNCSQ − 6.4016 × MiBOC(0.1) + 7.9845 0.83 0.78 0.38 36.82 0.155 32 9
M19 M14 (melanoma) S13 =−8.1796 × NN − 5.0035 × RNBr + 1.1723 × MiERIC + 4.9692 0.81 0.70 0.25 28.09 0.156 30 6
M20 MCF7 (breast) S2, S4, S5, S9, S10, S11, S13, S14, S16 =6.4410 × MaeeRC − 3.4532 × ERPCSQ − 1.7867 × ASIC1 − 2.7106 0.46 0.44 0.55 52.87 0.663 231 45
M21 MLM (glioblastoma) S16 =−8.2245 × EFPSA1Q + 1.1671 × ANRIO − 4.4003 × EHDSAQ 0.48 0.40 0.28 14.37 0.124 53 13
M22 H460 (lung) S2, S3, S14 =−1.3004 × MiPC + 6.3227 × Ma1ERIC + 2.4755 × MaenACO 0.59 0.49 0.41 19.82 0.152 60 16
M23 P388 (leukemia) S1 =−3.4460 × WPSA1z + 6.8634 × MiTICN + 8.1021 × HDCA1Q − 9.9755 0.81 0.73 0.31 31.74 0.0751 32 9
M24 Panc1 (pancreatic) S3 =1.8296 × SIC2 + 3.3629 × LNMVF − 1.7681 × FPSA3Q − 3.0118 0.87 0.73 0.11 21.63 0.016 16 3
M25 SF-468 (CNS) S14 =2.4189 × ABOC − 3.9606 × ERNCSQ + 2.4054 × EMaNAC − 2.3761 0.74 0.56 0.27 9.38 0.171 18 4
M26 SJ-G2 (brain) S2 =5.1327 × CIC2 + 1.5429 × AVN + 2.0716 × MiRECN − 7.3156 0.77 0.60 0.13 18.67 0.034 26 9
M27 SKMEL-5 (melanoma) S16 =−1.2423 × HOMO1 + 4.6277 × MaTICH − 2.3144 × EFHDSA − 6.8356 0.51 0.45 0.22 15.52 0.111 51 15
M28 SKOV3 (ovarian) S13 =2.7606 × MiPCNz − 3.5240 × EE + eeRCC + 1.3856 × EFHDCAQ + 5.0366 0.76 0.66 0.28 20.69 0.141 29 7
M29 SW480 (colon) S2 =7.0573 × MaASEN − 605367 × RNCl − 1.1217 × HDSAQ + 1.3202 0.69 0. 0.27 32.20 0.130 50 15
M30 U251 (CNS) S13 =−1.1620 × IOKSE + 5.3498 × EFHBSAQ − 1.5086 × RNN + 7.2762 0.76 0.69 0.31 24.27 0.154 29 7


All QSAR models were cross validated by these high Rcv2 values, obtained by a leave one out method for validation of the model Rcv2, which should be greater than 0.5.57 The regression summary for cell line-based QSAR models (M4, M6, M9, M11, M17, M16, M18, M19, M23 and M24) shows a high statistical quality (avg. R2 = 0.93, Rcv2 = 0.89) and appears valuable for the existing class of compounds. The statistical quality of a few other cell line-based models (M1, M7, M12, M14, M25, M26, M28, M29, M5, M3 and M30) also shows moderate statistical quality (avg. R2 = 0.71 and Rcv2 = 0.69), and these models can also used for the prediction. However some models (M27, M22, M21, M20, M15, M13, M10, M8 and M2) cannot be used for further prediction because of the narrow statistical quality of these models (avg. R2 = 0.58, Rcv2 = 0.45). The irrelevant results obtained from these models are probably due to the contribution of a higher number of compounds and 3 to 5 different scaffolds in these models. The increase in the number of descriptors in narrow range activity models is not very effective to improve the statistical quality of models. This shows that the currently used descriptors are not agreeable for developing the structure–activity relationship for these models, and one needs to try and develop additional descriptors. However the involvement of single scaffolds in these models provide a good statistical quality. All the details for the cell line-based models are illustrated in Table 2(b)

The calculated and experimental biological activity with residuals and descriptor values for all models are given in additional file A (Tables S18–S63). Fig. 3(a) and (b) show the plots between the experimental and calculated activity values for 15 cell lines and 8 scaffold-based QSAR models. Enduring plots are given in additional file A (Fig. S1(a) and (b)). According to the plots, the average residual for test and training set compounds clearly represents that compounds of the test set are closer to the line compared with the compounds of the training set.

A total of 109 descriptors were used in different combinations for the development of all the QSAR models. Fig. 5 illustrates the percentage of all the types of descriptors involved in the models. This figure shows the importance of quantum chemical descriptors (approx. 62%) followed by electrostatic (13.6%), topological (9.6%), geometrical (5.5%), and thermodynamic and constitutional (both in 4.5%). The intercorrelation of the descriptors for all the developed models has been investigated and explains that the descriptors are rationally orthogonal. In quantum chemical descriptors, charge-based descriptors such as Max n–n repulsion for a C–N bond, Max e–n attraction for a C–C bond and ESP-RPCG relative positive charge etc. are present in approximately 40 (approx. 37%) models. This was followed by valency-based descriptors and bond order-based descriptors, which are present in approx. 23% and 3% respectively. This represents the importance of charged-based, valency-based and bond order-based descriptors.


image file: c5ra18295f-f5.tif
Fig. 5 The percentage of various descriptors involved in the QSAR models (see additional file A Table S17 for the details of all descriptors).

The cell lines of different cancer types considered in the current study are presented in additional file A (Table S66). Among them, 7 cancer types have experimental data for more than one cell line. Thus, comparative statistical significance of various types of cancer has been done and presented in additional file A (Table S66). Pancreatic cancer (R2 = 0.87, Rcv2 = 0.73), leukemia (R2 = 0.86, Rcv2 = 0.80), renal (R2 = 0.85, Rcv2 = 0.76), cervical (R2 = 0.77, Rcv2 = 0.71), brain (R2 = 0.77, Rcv2 = 0.60), lung (R2 = 0.76, Rcv2 = 0.67) and CNS (R2 = 0.75, Rcv2 = 0.63) cancers have better statistical values compared with other types of cancer such as colon, breast, ovarian, skin, prostate, neuronal, melanoma and heptocellular etc. (avg. R2 = 0.60, avg. Rcv2 = 0.51).

4. Conclusion

Our aim in this investigation was to biologically evaluate a series of anticancer agents by methodically modifying the molecule, in order to explore the structure–activity relationships of these scaffolds. A total of 46 QSAR models, 16 different scaffolds and 30 different cell lines respectivel, were built to assess the predictive power of QSAR models, where the number of descriptors is improved from 1 to 10. This study reveals that three descriptor-based models are found to be satisfactory for further prediction and also shows that quantum chemical descriptors are the most important type of descriptor followed by electrostatic, topological, geometrical, thermodynamic and constitutional descriptors. An analogue-based design approach is important for modeling anticancer compounds. Developed models for all experimentally tested compounds contain higher correlation coefficients (R2), higher cross-validation coefficient (Rcv2) values and lower average residuals (AE) values. Cell lines in pancreatic cancer, with average R2 = 0.87, followed by cell lines in leukaemia cancer, with average R2 = 0.86, provided the best statistical values. Although the derived equation is of restricted validity due to the limited size of the training set, this result may prove fruitful in predicting new anticancer agents with desired activities.

Conflict of interest

The authors have declared no conflict of interest.

Acknowledgements

SS and OPC thank UGC, New Delhi for financial assistant. The support from the Department of Chemistry, Dr H. S. Gour Central University Sagar M.P. India is also acknowledged.

References

  1. A. Frace, C. Loge, S. Gallet, N. Lebegue, P. Carato, P. Chavatte, P. Berthelot and D. Lesieur, J. Enzyme Inhib. Med. Chem., 2004, 19, 541 CrossRef PubMed.
  2. J. B. Gibbs, Cancer research, Science, 2000, 287, 1969 CrossRef CAS.
  3. A. Kamal, Y. V. V. Srikanth, M. N. A. Khan, T. B. Shaik and M. Ashraf, Bioorg. Med. Chem. Lett., 2010, 20, 5229 CrossRef CAS PubMed.
  4. R. J. Abdel-Jalil, E. Q. E. Momani, M. Hamad, W. Voelter, M. S. Mubarak, B. H. Smith and D. G. Peters, Monatsh. Chem., 2010, 141, 251 CrossRef CAS.
  5. P. Hanumantharao and S. V. Sambasivarao, Bioorg. Med. Chem. Lett., 2005, 15, 3167 CrossRef CAS PubMed.
  6. T. K. Olszewski and B. Boduszek, Tetrahedron, 2010, 66, 8661 CrossRef CAS PubMed.
  7. D. J. Newman, G. M. Cragg and K. M. Snader, J. Nat. Prod., 2003, 66, 1022 CrossRef CAS PubMed.
  8. R. Breinbauer, I. R. Vetter and H. Waldmann, Angew. Chem., Int. Ed., 2002, 41, 2879 Search PubMed.
  9. G. M. Cragg, P. G. Grothaus and D. J. Newman, Chem. Rev., 2009, 109, 3012 CrossRef CAS PubMed.
  10. A. L. Harvey, Drug Discovery Today, 2008, 13, 894 CrossRef CAS PubMed.
  11. D. J. Snodin, Toxicol. Lett., 2002, 127, 161 CrossRef CAS.
  12. S. Kraljevic, P. J. Stambrook and K. Pavelic, EMBO Rep., 2004, 5, 837 CrossRef CAS PubMed.
  13. C. P. Adams and V. V. Brantner, Health Aff., 2006, 25, 420 CrossRef PubMed.
  14. Critical Path Opportunities Reports Challenges and Opportunities Report – March, 2004.
  15. C. W. Yap, Y. Xue, Z. R. Li and Y. Z. Chen, Curr. Top. Med. Chem., 2006, 6, 1593 CrossRef CAS.
  16. R. V. Guido, G. Oliva and A. D. Andricopulo, Curr. Med. Chem., 2008, 15, 37 CrossRef CAS.
  17. A. Schwaighofer, T. Schroeter, S. Mika and G. Blanchard, Comb. Chem. High Throughput Screening, 2009, 12, 453 CrossRef CAS.
  18. L. G. Valerio, Toxicol. Appl. Pharmacol., 2009, 241, 356 CrossRef CAS PubMed.
  19. A. Kamal, E. V. Bharathi, M. J. Ramaiah, D. Dastagiri, J. S. Reddy, A. Viswanath, S. N. C. V. L. Farheen Sultana, M. Pal-Bhadra, H. K. Srivastava, G. N. Sastry, A. Juvekar, S. Sen and S. Zingde, Bioorg. Med. Chem., 2010, 18, 526 CrossRef CAS PubMed.
  20. F. Xie, H. Zhao, L. Zhao, L. Lou and Y. Hu, Bioorg. Med. Chem. Lett., 2009, 19, 275 CrossRef CAS PubMed.
  21. Y. C. Martin, Perspect. Drug Discov. Des., 1998, 12, 3 CrossRef.
  22. U. Norinder, Perspect. Drug Discov. Des., 1998, 12, 25 CrossRef.
  23. D. J. Maddalena, Expert Opin. Ther. Pat., 1998, 8, 249 CrossRef CAS.
  24. H. Kubinyi, Drug Discovery Today, 1997, 2, 538 CrossRef CAS.
  25. C. Hansch and T. Fujita, J. Am. Chem. Soc., 1995, 606, 1 CAS.
  26. C. Hansch and A. Leo, J. Am. Chem. Soc., 1995, 1, 1 Search PubMed.
  27. R. Benigni and A. Giuliani, Bioinformatics, 2003, 19, 1194 CrossRef CAS PubMed.
  28. C. Hansch, A. Leo, S. B. Mekapati and A. Kurup, Bioorg. Med. Chem., 2004, 12, 3391 CrossRef CAS PubMed.
  29. H. K. Srivastava, M. Chourasia, D. Kumar and G. N. Sastry, J. Chem. Inf. Model., 2011, 51, 558 CrossRef CAS PubMed.
  30. M. T. Cronin and J. C. Dearden, Quant. Struct.–Act. Relat., 1995, 14, 117 CrossRef CAS PubMed.
  31. M. Mwense, X. Z. Wang, F. V. Buontempo, N. Horan, A. Young and D. Osborn, SAR QSAR Environ. Res., 2006, 17, 53 CrossRef CAS PubMed.
  32. M. Zhao, Z. Li, Y. Wu, Y. R. Tang, C. Wang, Z. Zhang and S. Peng, Eur. J. Med. Chem., 2007, 42, 955 CrossRef CAS PubMed.
  33. A. S. Reddy, S. P. Pati, P. P. Kumar, H. N. Pradeep and G. N. Sastry, Curr. Protein Pept. Sci., 2007, 8, 329 CrossRef CAS.
  34. F. A. Pasha, M. Muddassar and S. J. Cho, Chem. Biol. Drug Des., 2009, 73, 292 CAS.
  35. H. K. Srivastava, F. A. Pasha and P. P. Singh, Int. J. Quantum Chem., 2005, 103, 237 CrossRef CAS PubMed.
  36. P. Srivani and G. Narahar Sastry, J. Mol. Graphics Modell., 2009, 27, 676 CrossRef CAS PubMed.
  37. S. Janardhan, P. Srivani and G. Narahari Sastry, QSAR Comb. Sci., 2006, 25, 860 CAS.
  38. P. Sivaprakasam, A. Xie and R. J. Doerksen, Bioorg. Med. Chem., 2006, 14, 8210 CrossRef CAS PubMed.
  39. J. C. Chen, Y. Shen, S. Y. Liao, L. M. Chen and K. C. Zheng, Int. J. Quantum Chem., 2007, 107, 1468 CrossRef CAS PubMed.
  40. S. Zhang, L. Wei, K. Bastow, W. Zheng, A. Brossi, K. H. Lee and A. Tropsha, J. Comput.-Aided Mol. Des., 2007, 21, 97 CrossRef CAS PubMed.
  41. R. Garg, W. A. Denny and C. Hansch, Bioorg. Med. Chem., 2000, 8, 1835 CrossRef CAS.
  42. T. A. Hill, S. G. Stewart, S. P. Ackland, J. Gilbert, B. Sauer, J. A. Sakoff and A. McCluskey, Bioorg. Med. Chem., 2007, 15, 6126 CrossRef CAS PubMed.
  43. B. P. Bandgar, S. S. Gawande, R. G. Bodade, J. V. Totre and C. N. Khobragade, Bioorg. Med. Chem., 2010, 18, 1364 CrossRef CAS PubMed.
  44. R. K. Yadlapalli, O. P. Chourasia, K. Vemuri, M. Sritharan and R. S. Perali, Bioorg. Med. Chem. Lett., 2012, 22, 2708 CrossRef CAS PubMed.
  45. M. Kumar, K. Ramasamy, V. Mani, R. K. Mishra, A. B. Abdul Majeed, E. D. Clercq and B. Narasimhan, Arabian J. Chem., 2014, 7, 396 CrossRef CAS PubMed.
  46. J. Hyun, S. Y. Shin, K. M. So, Y. H. Lee and Y. Lim, Bioorg. Med. Chem. Lett., 2012, 22, 2664 CrossRef CAS PubMed.
  47. R. Mohan, N. Rastogi, N. Irishi, N. Namboothiri, S. M. Mobinc and D. Pandaa, Bioorg. Med. Chem., 2006, 14, 8073 CrossRef CAS PubMed.
  48. J. C. Chen, L. Qian, W. J. Wu, L. M. Chen and K. C. Zheng, J. Mol. Struct.: THEOCHEM, 2005, 756, 167 CrossRef CAS PubMed.
  49. I. V. Magedov, L. Frolova, M. Manpadi, U. D. Bhoga, H. Tang, N. M. Evdokimo, O. George, K. H. Georgiou, S. Renner, M. Getlic, T. L. Kinnibrugh, M. A. Fernandes, S. V. slambrouck, W. F. A. Steelant, C. B. Shuster, S. Rogelj, W. A. L. van Otterlo and A. Kornienko, J. Med. Chem., 2011, 23, 4234 CrossRef PubMed.
  50. H. H. Wang, K. M. Qiu, H. E. Cui, Y. S. Yang, Y. Luo, M. Xing, X. Y. Qui, L. F. Bai and H. L. Zhu, Bioorg. Med. Chem., 2013, 21, 448 CrossRef CAS PubMed.
  51. V. M. Sharma, P. Prasanna, K. V. Adi Seshu, B. Renuka, C. V. Laxman Rao, G. Sunil Kumar, C. Prasad Narasimhulu, P. Aravind Babu, R. C. Puranik, D. Subramanyam, A. Venkateswarlu, S. Rajagopal, K. B. Sunil Kumar, C. Seshagiri Rao, N. V. S. Rao Mamidi, D. S. Deevi, R. Ajaykumarb and R. Rajagopalanb, Bioorg. Med. Chem. Lett., 2002, 12, 2303 CrossRef CAS.
  52. B. Zarranz, A. Jaso, I. Aldana and A. Monge, Bioorg. Med. Chem., 2004, 12, 3711 CrossRef CAS PubMed.
  53. S. Ren, R. Wang, K. Komatsu, P. B. Krause, Y. Zyrianov, C. E. McKenna, C. Csipke, Z. A. Tokes and E. J. Lien, J. Med. Chem., 2002, 45, 410 CrossRef CAS PubMed.
  54. M. Cushman, D. Nagarathnam, D. Gopal, A. K. Chakraborti, C. M. Lin and E. Hamel, J. Med. Chem., 1991, 34, 2579 CrossRef CAS.
  55. (a) CODESSA version 2.0, Semichem, 7204 Mullen, Shawnee, KS 66216 USA Search PubMed; (b) M. Karelson, V. S. Lobanov and A. R. Katritzky, Chem. Rev., 1996, 96, 1027 CrossRef CAS PubMed.
  56. M. H. Bohari, H. K. Srivastava and G. N. Sastry, Org. Med. Chem. Lett., 2011, 1, 3 CrossRef PubMed.
  57. A. Golbraikh and A. Tropsha, J. Mol. Graphics Modell., 2002, 20, 269 CrossRef CAS.

Footnote

Electronic supplementary information (ESI) available. See DOI: 10.1039/c5ra18295f

This journal is © The Royal Society of Chemistry 2015
Click here to see how this site uses Cookies. View our privacy policy here.