Shruti Satbhaiya* and
O. P. Chourasia
Heterocyclic Research Laboratory, Department of Chemistry, Dr Hari Singh Gour Vishwavidyalaya, Sagar, M.P. 470003, India. E-mail: opcchem@gmail.com; satbhaiyashruti@gmail.com; Tel: +91-9009509089, +91-9098999717
First published on 25th September 2015
Based on a linear heuristic method, a quantitative structure–activity relationship was developed for the prediction of available in vitro anticancer activity. Each type of compound was represented by several calculated structural descriptors. Most of the computational studies were carried out targeting an insufficient number of cell lines. Predictive models were built for 482 compounds with experimental data against 30 different cancer cell lines. Strong statistical analysis showed a high correlation, cross validation coefficient values and provided a range of QSAR equations. Quantum chemical descriptors were found in 42 out of 46 models, electrostatic in 16, topological in 12, geometrical in 7, thermodynamic in 5 and constitutional in 7. It is interesting to note that in most cases, three descriptor-based models were relevant. Pancreatic cancer cell lines showed the best statistical values (average R2 = 0.87) followed by leukaemia (average R2 = 0.86).
The main sources of lead compounds for drug development are natural products because of their intrinsic biorelevant presence of small hetero-aromatic compounds. They have shown unexpected biological properties and became the basis for a whole number of innovative medicinal agents.6 The collection of these compounds is dramatically higher than those resulting from high throughput screens of combinatorial libraries.7–9 Preparation of libraries based on natural products requires sophisticated and laborious synthetic sequences. In addition, therapeutic development of promising leads resulting from these libraries is significantly impeded by the problem of large-scale compound supply. Because of the improved interest in natural products by the failure of alternative methods to provide many therapeutic lead compounds and by the pharmaceutical industry, these challenges are becoming increasingly more pertinent.10
The pharmaceutical industry has to ensure the safety, quality, and efficacy of a marketed drug by subjecting the drug to a range of analysis.11 To acquire a complete product of drug discovery it takes long time approximately 12 years12 and the projected cost for a marketed drug is high.13 This expensive and lengthy process may cause failure of drug development. Thus, it will be useful to predict these failures prior to the clinical stage in order to reduce drug development costs.14 To filter out potential failures during the drug development stage, various methods such as in vitro, in vivo or in silico methods are being used. Quantitative structure–activity relationship (QSAR) modeling is an example of an in silico method, which can be used to understand drug action, design new compounds, and screen chemical libraries.15–18 Combinatorial approaches are an influential tool in selection to speed up drug discovery and with different mechanisms of action, this method is being adopted to cure cancer.19,20 QSAR has become crucial for the molecular interpretation of biological properties.21–26 This technique is the most important tool used in analogue-based drug design and has been broadly used for the calculation of assorted properties like carcinogenicity,27 ADME,28 stability,29 toxicity,30,31 retention time32 and other physicochemical properties apart from the biological activity.33–36 The QSAR method makes possible the theoretical prediction of structures with desired property values by combining the QSAR method with pattern recognition techniques. In lead optimization, development of QSAR models using various physicochemical descriptors has been a vital task.37 The use of such multiple QSAR models to derive a mechanistic approach can be illustrated by a comparison of the experimental data available on the anticancer agents. Computational methods also aid the rapid generation of new hypotheses, as well as the design and interpretation of hypothesis-driven experiments in the field of cancer research.
A number of quantum chemical descriptors (such as molecular orbital, charge and dipole moments, etc.), electrostatic descriptors (such as charge-based descriptors etc.), geometrical descriptors (such as moment of inertia etc.) and thermodynamic descriptors (such as entropy and vibrational frequency etc.) have been effectively applied to set up QSAR models for predicting activities of compounds.38–40 There are a large number of cell lines available for any given cancer type, on which in vitro biological activity can be executed, but the results of this prediction differ based on the cell line used for the assay. As a result it becomes complicated for computational chemists to select experimental data from a pool of existing biological activity for a single scaffold type. In vitro experimental data for anticancer activity is available against many different cell lines. In the literature, QSAR studies are carried out mainly for any one particular cell line, which may not be a good approach. This study considered all the available experimental data for many different cell lines to build predictive models, and will aid medicinal chemists to more reliably design new and potent compounds. Analyses of the obtained descriptors for models against all the cell lines, may suggest the significance of a particular type of descriptor in modeling anticancer activity against a cancer type.
![]() | ||
Fig. 1 482 compounds which have IC50 values represented different scaffolds (S1–S16). The number of compounds in each scaffold are in parenthesis and the different cell lines against which the cytotoxicity values were reported (please see Tables S1–S16† in additional file A for the structures of all the compounds with their in vitro IC50 values against various cell lines). |
S. no. | Scaffold name | Cell lines | Cancer type | No. of compound | Ref. |
---|---|---|---|---|---|
S1 | Acridine | P388 | Leukemia | 41 | 41 |
LLc | Lung | 41 | |||
JLc | Leukemia | 41 | |||
S2 | Cantharidine | HT-29 | Colon | 35 | 42 |
SW480 | Colon | 35 | |||
MCF-7 | Breast | 35 | |||
A2780 | Ovarian | 35 | |||
H460 | Lung | 35 | |||
A431 | Skin | 35 | |||
DU145 | Prostate | 35 | |||
BE2-C | Neuronal | 35 | |||
SJ-G2 | Brain | 35 | |||
S3 | Chalcone | ACHN | Renal | 19 | 43 |
Pancc1 | Pancreatic | 19 | |||
Calu1 | Lung | 19 | |||
H460 | Lung | 19 | |||
HCT116 | Colon | 19 | |||
S4 | Tetrahropyrimidine | MCF-7 | Breast | 23 | 44 |
S5 | Isatin | HCT116 | Colon | 32 | 45 |
MCF-7 | Breast | 32 | |||
S6 | Isoflavne | HCT116 | Colon | 23 | 46 |
S7 | Nitroalkene | HeLa | Cervical | 22 | 47 |
S8 | Phenazine | H69 | Lung | 18 | 48 |
S9 | Podophyllotoxin | HeLa | Cervical | 30 | 49 |
MCF7 | Breast | 30 | |||
S10 | Pyrazole | HeLa | Cervical | 17 | 49 |
MCF-7 | Breast | 17 | |||
S11 | Pyrazoline | MCF-7 | Breast | 20 | 50 |
B16-F10 | Melanoma | 20 | |||
S12 | Pyrimidine | BEL-7402 | Heptocellular | 37 | 20 |
S13 | Quinazoline | MCF-7 | Breast | 36 | 51 |
U251 | CNS | 36 | |||
SW480 | Colon | 36 | |||
H522 | Lung | 36 | |||
M14 | Melanoma | 36 | |||
SKOV3 | Ovarian | 36 | |||
DU145 | Prostate | 36 | |||
A498 | Renal | 36 | |||
S14 | Quinoxaline | MCF-7 | Breast | 22 | 52 |
H460 | Lung | 22 | |||
SF-268 | CNS | 22 | |||
S15 | Semicarbazide | L120 | Leukemia | 30 | 53 |
S16 | Stillbene | A549 | Lung | 69 | 54 |
MCF-7 | Breast | 69 | |||
HT-29 | Colon | 69 | |||
SKMEL-5 | Melanoma | 69 | |||
MLM | Melanoma | 69 |
![]() | ||
Fig. 2 Flowchart for the methodology accepted for the development and validation of the QSAR models. |
Two different schemes were chosen to develop statistically significant QSAR models. In the first scheme, 16 QSAR models were developed for the 16 scaffolds used in this investigation (i.e. scaffold-based QSAR models), whereas in the second scheme, 30 different QSAR models were developed based on the availability of IC50 values against 30 cancer cell lines by combining all the scaffolds (i.e. cell line-based QSAR models). For all the models, intercorrelation of the descriptors was also tested. Then, models containing highly intercorrelated descriptors were replaced and refined so that the descriptors, which were employed in the given models, were practically orthogonal to each other.
Large numbers of descriptors will create confusion and reduce the predictive ability and statistical robustness of the model. So we scrupulously developed 3, 4 and 5 descriptor-based models for all the sets of compounds to find out the minimum number of descriptor defining activity, with the help of a heuristic method which is a multilinear regression method. This method is better than other methods due to its high speed. This method usually produces correlation 2–5 times faster than other methods with comparable quality and it has no restriction on the size of the data set. On comparison with four and five descriptor-based models, three descriptor-based models were found to be satisfactory for all sets of compounds. To assess the statistical quality of the models, various parameters like R2, Rcv2, AE, S2, F and t-test are essential, which were obtained from the correlation of approximately 540 descriptors (constitutional, geometrical, topological, electrostatic, thermodynamic and quantum chemical etc.) in different combinations.56 The R2 value is a relative measure of the quality of fit, F represents the F-ratio between the variance of the calculated and experimental activity, and the t-test reflects the significance of the parameter within the model. The effect of the number of descriptors on the correlation coefficient was examined on the set of molecules using a heuristic method at 1–10 descriptors.
Among the developed models, sixteen and thirty models were selected on the basis of several statistical and other parameters such as R2, Rcv2, S2, AE (average residual) values, Fischer’s values (F test) and a t-test. The relationship between the number of descriptors and the correlation values for all models were determined by correlating 1–10 descriptors individually, as shown in Fig. 4(a) and (b) for cell line-based models and scaffold-based models respectively. Among all the models, three descriptor models were acceptable for getting the best correlation because models with more than six descriptor may give high correlation values, which may be phony and may not be constructive for the further prediction of biological activities.
All the models were separated into training sets and test sets. Developed models, which were constructed using training set compounds, were used to determine the activity of test set compounds. Lower average residual values obtained from both the training and test set indicate which models have a high potential to establish the correlation between the structure and activity.
Most of the scaffold-based QSAR models along with regression equations, cancer type and the names of the cell lines are given in Table 2(a). We obtained superior statistically quality for most of the scaffold-based QSAR models with higher correlation coefficient values than the cell line-based models. The lower number of compounds is an important reason for the high correlation coefficients of these models. The range of activity of the compounds in three (S2, S10 and S12) models is poor. In comparison, models containing broad activity range compounds show high correlation coefficients, while narrow activity range compounds show lower correlation coefficient values. Besides these models, all the scaffold-based models with high correlation coefficient values seen rational and can be used for further prediction.
No. | Cell lines (type) | Regression equation | R2 | Rcv2 | AE | F | S2 | # comp | |
---|---|---|---|---|---|---|---|---|---|
TR | TS | ||||||||
a R2 is the square of the correlation coefficient and represents the statistical significance of the model. Rcv2 is the cross-validated R2, a measure of the quality of the QSAR model. AE is the average of absolute difference between the experimental and calculated IC50 values. F is the Fischer statistic, the ratio between the explained and unexplained variance for a given number of degrees of freedom, thereby indicating a factual correlation or the significance level for the QSAR models. S2 is the standard deviation. TR is number of molecules in training set and TE is test set molecules. | |||||||||
S1 | P388 (leukemia) | =−6.2155 × VE/T + 2.3164 × WPSA3Q + 3.3250 × LNMVF + 1.5252 | 0.75 | 0.67 | 0.35 | 26.74 | 0.145 | 31 | 9 |
S2 | HT29 (colon) | =1.8444 × RNB − 3.3083 × MaenAC + 0.13180 × PMIA + 6.1097 | 0.69 | 0.55 | 0.12 | 15.36 | 0.033 | 27 | 8 |
S3 | ACHN (renal) | =7.4622 × FPSA3z − 7.8674 × WNSA2z − 1.0224 × BI + 1.8722 | 0.98 | 0.95 | 0.05 | 105.92 | 0.001 | 15 | 4 |
S4 | MCF7 (breast) | =−2.0431 × PPSA3z + 2.7466 × ZXS/ZXR − 2.0527 × RNCGQ + 3.5395 | 0.89 | 0.73 | 0.09 | 29.10 | 0.009 | 17 | 5 |
S5 | HCT116 (colon) | =4.1330 × RNCl − 2.1896 × RNCSz − 1.2796 × FNSA2Q − 2.7626 | 0.77 | 0.66 | 0.21 | 21.31 | 0.028 | 23 | 8 |
S6 | HCT116 (colon) | =4.0785 × EMiNACC − 4.1004 × PNSA2Q − 1.4298EHBCAQ + 2.0450 | 0.88 | 0.82 | 0.14 | 31.98 | 0.018 | 18 | 5 |
S7 | HeLa (cervical) | =−9.3376 × PMIB − 2.0744 × EHDSAQ + 1.7527 × EMaNACC + 3.8717 | 0.85 | 0.75 | 0.14 | 19.85 | 0.036 | 15 | 4 |
S8 | H69 (lung) | =4.7221 × MaPCHz − 2.5135 × TE/#A-t + 2.6179 × MiERIC | 0.96 | 0.93 | 0.12 | 76.93 | 0.019 | 14 | 4 |
S9 | HeLa (cervical) | =2.1519 × MiERIN + 3.4050 × MaREHN − 1.0293 × ABOC − 3.4442 | 0.93 | 0.90 | 0.12 | 84.32 | 0.074 | 22 | 6 |
S10 | HeLa (cervical) | =9.0910 × MaenACH − 6.2862 × HNMVF + 1.2038 × MaPPBO − 1.5070 | 0.96 | 0.90 | 0.15 | 50.38 | 0.051 | 14 | 3 |
S11 | B16-F10 (melanoma) | =−2.7430 × 1XGP + 5.4109 × DPSA2z − 6.9408 × EHDSAQ + 3.9304 | 0.93 | 0.88 | 0.11 | 41.90 | 0.012 | 15 | 5 |
S12 | BEL-7402 (melanoma) | =2.98993 × HLEG + 1.0598 × MaeeRN + 2.9834 × KSI3 − 4.0099 | 0.70 | 0.60 | 0.35 | 20.86 | 0.106 | 32 | 11 |
S13 | M14 (melanoma) | =4.4562 × MiERIN − 3.6455 × MiAOEP + 1.0863 × MiBON (0.1) + 1.3577 | 0.65 | 0.56 | 0.31 | 15.05 | 0.215 | 28 | 8 |
S14 | SF-268 (CNS) | =−1.3737 × MaTICC + 1.2771 × EPNSA3Q − 1.2524 × MiTICN + 4.8510 | 0.74 | 0.59 | 0.46 | 11.22 | 0.240 | 17 | 5 |
S15 | L120 (leukemia) | =1.712 × RNN − 4.0400 × MiERIC + 9.1240 × MIA − 4.5014 | 0.92 | 0.89 | 0.24 | 71.94 | 0.055 | 22 | 8 |
S16 | A549 (lung) | =−1.2148 × MaenACC + 5.4537 × FNSA1Q − 6.7738 × AIC1 | 0.48 | 0.38 | 0.24 | 12.46 | 0.087 | 49 | 17 |
No. | Cell line (type) | Scf. | Regression equation | R2 | Rcv2 | AE | F | S2 | # of comp. | |
---|---|---|---|---|---|---|---|---|---|---|
TR | TS | |||||||||
M1 | A498 (renal) | S13 | =−3.3738 × MannRCN + −1.2453 × ASIC1 − 1.0807 × RNCSz + 4.6355 | 0.71 | 0.56 | 0.62 | 14.87 | 0.239 | 31 | 5 |
M2 | A549 (lung) | S16 | =−1.3066 × MaenACC + 1.1275 × PNSA1Q − 7.9011 × PNSA2z + 3.3541 | 0.56 | 0.49 | 0.27 | 18.41 | 0.094 | 56 | 10 |
M3 | A2780 (ovarian) | S2 | =5.3016 × MiNRIO + 6.1285 × HACA1Q − 1.222 × RPCSz − 3.6341 | 0.68 | 0.54 | 0.22 | 13.89 | 0.043 | 30 | 5 |
M4 | ACHN (renal) | S3 | =7.4622 × FPSA3z − 7.8674 × WNSA2z − 1.0224 × BI + 1.8722 | 0.96 | 0.95 | 0.05 | 105.93 | 0.001 | 16 | 3 |
M5 | A431 (skin) | S2 | =−7.2276 × YZS + 1.0111 × RI0 × +7.4112 × MiERIC + 2.5462 | 0.69 | 0.59 | 0.14 | 13.43 | 0.036 | 30 | 5 |
M6 | B16-F10 (melanoma) | S11 | =−1.0251 × WPSA1Q − 1.7686 × MiTICS + 1.6039 × MiNACN + 2.7567 | 0.94 | 0.89 | 0.11 | 62.56 | 0.015 | 17 | 3 |
M7 | BE2-C (neuronal) | S2 | =−5.0255 × XYS/XYR + 1.4971 × MiERIC − 6.7892 × RNO + 5.2368 | 0.72 | 0.63 | 0.16 | 16.67 | 0.030 | 29 | 7 |
M8 | BEL-7402 (heptocellular) | S12 | =2.6386 × HLEG − 6.0993 × WNSA2Q − 4.8693 × MiERIN − 2.2423 | 0.58 | 0.44 | 0.31 | 12.34 | 0.177 | 35 | 10 |
M9 | Calu1 (lung) | S3 | =2.2933 × A1ERIC − 2.3063 × SIC1 − 3.0798 × YZS + 5.2315 | 0.93 | 0.80 | 0.12 | 41.25 | 0.023 | 16 | 3 |
M10 | DU145 (prostate) | S3, S13 | =7.0109 × MiTICN − 7.8249 × PNSA1z − 2.7692 × CHaSz − 6.9257 | 0.43 | 0.32 | 0.45 | 11.60 | 0.307 | 57 | 15 |
M11 | H69 (lung) | S8 | =−1.1403 × FNSA2z − 5.8440 × ERPCGQ − 5.2628 × MienACH + 3.4678 | 0.93 | 0.81 | 0.13 | 44.62 | 0.033 | 15 | 3 |
M12 | H522 (lung) | S13 | =1.4711 × MiPCCz × +2.3775 × MiBOC(0.1) + 8.7939 × THCMD − 2.1601 | 0.73 | 0.63 | 0.24 | 18.26 | 0.201 | 28 | 8 |
M13 | HCT116 (colon) | S3, S5, S6 | =2.7531 × RE/T + 3.3081 × LNMVF + 64.6887 × ACIC1 − 1.0055 | 0.59 | 0.49 | 0.29 | 24.10 | 0.119 | 60 | 14 |
M14 | HeLa (cervical) | S7, S9, S10 | =−6.4020 × MaBON − 9.5518 × MienACN + 7.3732NN + 3.7878 | 0.76 | 0.71 | 0.40 | 43.55 | 0.377 | 56 | 11 |
M15 | HT29 (colon) | S2, S16 | =−6.4490 × FNSA2Q + 1.3955 × PP/SD + 9.1032 × MaenAC − 1.6224 | 0.30 | 0.22 | 0.27 | 9.54 | 0.098 | 81 | 20 |
M16 | JLc (leukemia) | S1 | =−7.0103 × EMaNACH + 1.5761 × HDSA2Q − 1.0051 × MaeeRC + 1.0507 | 0.86 | 0.82 | 0.33 | 44.61 | 0.069 | 30 | 11 |
M17 | L120 (leukemia) | S15 | =−4.8986 × HDCA2Q + 1.7065 × WNSA2z − 1.4993 × MienANN + 6.9159 | 0.90 | 0.84 | 0.17 | 43.50 | 0.066 | 25 | 6 |
M18 | LLc (lung) | S1 | =−1.5310 × ZXS/ZXR + 2.7870 × ERNCSQ − 6.4016 × MiBOC(0.1) + 7.9845 | 0.83 | 0.78 | 0.38 | 36.82 | 0.155 | 32 | 9 |
M19 | M14 (melanoma) | S13 | =−8.1796 × NN − 5.0035 × RNBr + 1.1723 × MiERIC + 4.9692 | 0.81 | 0.70 | 0.25 | 28.09 | 0.156 | 30 | 6 |
M20 | MCF7 (breast) | S2, S4, S5, S9, S10, S11, S13, S14, S16 | =6.4410 × MaeeRC − 3.4532 × ERPCSQ − 1.7867 × ASIC1 − 2.7106 | 0.46 | 0.44 | 0.55 | 52.87 | 0.663 | 231 | 45 |
M21 | MLM (glioblastoma) | S16 | =−8.2245 × EFPSA1Q + 1.1671 × ANRIO − 4.4003 × EHDSAQ | 0.48 | 0.40 | 0.28 | 14.37 | 0.124 | 53 | 13 |
M22 | H460 (lung) | S2, S3, S14 | =−1.3004 × MiPC + 6.3227 × Ma1ERIC + 2.4755 × MaenACO | 0.59 | 0.49 | 0.41 | 19.82 | 0.152 | 60 | 16 |
M23 | P388 (leukemia) | S1 | =−3.4460 × WPSA1z + 6.8634 × MiTICN + 8.1021 × HDCA1Q − 9.9755 | 0.81 | 0.73 | 0.31 | 31.74 | 0.0751 | 32 | 9 |
M24 | Panc1 (pancreatic) | S3 | =1.8296 × SIC2 + 3.3629 × LNMVF − 1.7681 × FPSA3Q − 3.0118 | 0.87 | 0.73 | 0.11 | 21.63 | 0.016 | 16 | 3 |
M25 | SF-468 (CNS) | S14 | =2.4189 × ABOC − 3.9606 × ERNCSQ + 2.4054 × EMaNAC − 2.3761 | 0.74 | 0.56 | 0.27 | 9.38 | 0.171 | 18 | 4 |
M26 | SJ-G2 (brain) | S2 | =5.1327 × CIC2 + 1.5429 × AVN + 2.0716 × MiRECN − 7.3156 | 0.77 | 0.60 | 0.13 | 18.67 | 0.034 | 26 | 9 |
M27 | SKMEL-5 (melanoma) | S16 | =−1.2423 × HOMO1 + 4.6277 × MaTICH − 2.3144 × EFHDSA − 6.8356 | 0.51 | 0.45 | 0.22 | 15.52 | 0.111 | 51 | 15 |
M28 | SKOV3 (ovarian) | S13 | =2.7606 × MiPCNz − 3.5240 × EE + eeRCC + 1.3856 × EFHDCAQ + 5.0366 | 0.76 | 0.66 | 0.28 | 20.69 | 0.141 | 29 | 7 |
M29 | SW480 (colon) | S2 | =7.0573 × MaASEN − 605367 × RNCl − 1.1217 × HDSAQ + 1.3202 | 0.69 | 0. | 0.27 | 32.20 | 0.130 | 50 | 15 |
M30 | U251 (CNS) | S13 | =−1.1620 × IOKSE + 5.3498 × EFHBSAQ − 1.5086 × RNN + 7.2762 | 0.76 | 0.69 | 0.31 | 24.27 | 0.154 | 29 | 7 |
All QSAR models were cross validated by these high Rcv2 values, obtained by a leave one out method for validation of the model Rcv2, which should be greater than 0.5.57 The regression summary for cell line-based QSAR models (M4, M6, M9, M11, M17, M16, M18, M19, M23 and M24) shows a high statistical quality (avg. R2 = 0.93, Rcv2 = 0.89) and appears valuable for the existing class of compounds. The statistical quality of a few other cell line-based models (M1, M7, M12, M14, M25, M26, M28, M29, M5, M3 and M30) also shows moderate statistical quality (avg. R2 = 0.71 and Rcv2 = 0.69), and these models can also used for the prediction. However some models (M27, M22, M21, M20, M15, M13, M10, M8 and M2) cannot be used for further prediction because of the narrow statistical quality of these models (avg. R2 = 0.58, Rcv2 = 0.45). The irrelevant results obtained from these models are probably due to the contribution of a higher number of compounds and 3 to 5 different scaffolds in these models. The increase in the number of descriptors in narrow range activity models is not very effective to improve the statistical quality of models. This shows that the currently used descriptors are not agreeable for developing the structure–activity relationship for these models, and one needs to try and develop additional descriptors. However the involvement of single scaffolds in these models provide a good statistical quality. All the details for the cell line-based models are illustrated in Table 2(b)
The calculated and experimental biological activity with residuals and descriptor values for all models are given in additional file A (Tables S18–S63†). Fig. 3(a) and (b) show the plots between the experimental and calculated activity values for 15 cell lines and 8 scaffold-based QSAR models. Enduring plots are given in additional file A (Fig. S1(a) and (b)†). According to the plots, the average residual for test and training set compounds clearly represents that compounds of the test set are closer to the line compared with the compounds of the training set.
A total of 109 descriptors were used in different combinations for the development of all the QSAR models. Fig. 5 illustrates the percentage of all the types of descriptors involved in the models. This figure shows the importance of quantum chemical descriptors (approx. 62%) followed by electrostatic (13.6%), topological (9.6%), geometrical (5.5%), and thermodynamic and constitutional (both in 4.5%). The intercorrelation of the descriptors for all the developed models has been investigated and explains that the descriptors are rationally orthogonal. In quantum chemical descriptors, charge-based descriptors such as Max n–n repulsion for a C–N bond, Max e–n attraction for a C–C bond and ESP-RPCG relative positive charge etc. are present in approximately 40 (approx. 37%) models. This was followed by valency-based descriptors and bond order-based descriptors, which are present in approx. 23% and 3% respectively. This represents the importance of charged-based, valency-based and bond order-based descriptors.
![]() | ||
Fig. 5 The percentage of various descriptors involved in the QSAR models (see additional file A Table S17† for the details of all descriptors). |
The cell lines of different cancer types considered in the current study are presented in additional file A (Table S66†). Among them, 7 cancer types have experimental data for more than one cell line. Thus, comparative statistical significance of various types of cancer has been done and presented in additional file A (Table S66†). Pancreatic cancer (R2 = 0.87, Rcv2 = 0.73), leukemia (R2 = 0.86, Rcv2 = 0.80), renal (R2 = 0.85, Rcv2 = 0.76), cervical (R2 = 0.77, Rcv2 = 0.71), brain (R2 = 0.77, Rcv2 = 0.60), lung (R2 = 0.76, Rcv2 = 0.67) and CNS (R2 = 0.75, Rcv2 = 0.63) cancers have better statistical values compared with other types of cancer such as colon, breast, ovarian, skin, prostate, neuronal, melanoma and heptocellular etc. (avg. R2 = 0.60, avg. Rcv2 = 0.51).
Footnote |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/c5ra18295f |
This journal is © The Royal Society of Chemistry 2015 |