Scaffold and cell line based approaches for QSAR studies on anticancer agents

Shruti Satbhaiya; O. P. Chourasia

doi:10.1039/C5RA18295F

View PDF VersionPrevious ArticleNext Article

DOI: 10.1039/C5RA18295F (Paper) RSC Adv., 2015, 5, 84810-84820

Scaffold and cell line based approaches for QSAR studies on anticancer agents†

Shruti Satbhaiya* and O. P. Chourasia
Heterocyclic Research Laboratory, Department of Chemistry, Dr Hari Singh Gour Vishwavidyalaya, Sagar, M.P. 470003, India. E-mail: opcchem@gmail.com; satbhaiyashruti@gmail.com; Tel: +91-9009509089, +91-9098999717

Received 7th September 2015 , Accepted 24th September 2015

First published on 25th September 2015

Abstract

Based on a linear heuristic method, a quantitative structure–activity relationship was developed for the prediction of available in vitro anticancer activity. Each type of compound was represented by several calculated structural descriptors. Most of the computational studies were carried out targeting an insufficient number of cell lines. Predictive models were built for 482 compounds with experimental data against 30 different cancer cell lines. Strong statistical analysis showed a high correlation, cross validation coefficient values and provided a range of QSAR equations. Quantum chemical descriptors were found in 42 out of 46 models, electrostatic in 16, topological in 12, geometrical in 7, thermodynamic in 5 and constitutional in 7. It is interesting to note that in most cases, three descriptor-based models were relevant. Pancreatic cancer cell lines showed the best statistical values (average R² = 0.87) followed by leukaemia (average R² = 0.86).

1. Introduction

Cancer is a multifactorial disease of striking significance in the world today and ranks high among human diseases. It has become the second leading cause of death among the human population^1,2 after cardiovascular diseases. Therefore the development of potent and precise anticancer agents is urgently needed³ and still a major challenge to medicinal chemistry research. Researchers have given attention towards the discovery of novel anticancer agents due to a lack of an extensive range of anticancer drugs to take advantage of new discoveries concerning tumour genesis. Tied with the exclusive growth pattern of various repertoires of cancer⁴ but due to acquisition by cancer cells of multiple-drug resistance, current anticancer chemotherapy still suffers. Due to a vast increase in the number of feasible molecular targets, the focus has shifted from target identification to target validation.⁵

The main sources of lead compounds for drug development are natural products because of their intrinsic biorelevant presence of small hetero-aromatic compounds. They have shown unexpected biological properties and became the basis for a whole number of innovative medicinal agents.⁶ The collection of these compounds is dramatically higher than those resulting from high throughput screens of combinatorial libraries.^7–9 Preparation of libraries based on natural products requires sophisticated and laborious synthetic sequences. In addition, therapeutic development of promising leads resulting from these libraries is significantly impeded by the problem of large-scale compound supply. Because of the improved interest in natural products by the failure of alternative methods to provide many therapeutic lead compounds and by the pharmaceutical industry, these challenges are becoming increasingly more pertinent.¹⁰

The pharmaceutical industry has to ensure the safety, quality, and efficacy of a marketed drug by subjecting the drug to a range of analysis.¹¹ To acquire a complete product of drug discovery it takes long time approximately 12 years¹² and the projected cost for a marketed drug is high.¹³ This expensive and lengthy process may cause failure of drug development. Thus, it will be useful to predict these failures prior to the clinical stage in order to reduce drug development costs.¹⁴ To filter out potential failures during the drug development stage, various methods such as in vitro, in vivo or in silico methods are being used. Quantitative structure–activity relationship (QSAR) modeling is an example of an in silico method, which can be used to understand drug action, design new compounds, and screen chemical libraries.^15–18 Combinatorial approaches are an influential tool in selection to speed up drug discovery and with different mechanisms of action, this method is being adopted to cure cancer.^19,20 QSAR has become crucial for the molecular interpretation of biological properties.^21–26 This technique is the most important tool used in analogue-based drug design and has been broadly used for the calculation of assorted properties like carcinogenicity,²⁷ ADME,²⁸ stability,²⁹ toxicity,^30,31 retention time³² and other physicochemical properties apart from the biological activity.^33–36 The QSAR method makes possible the theoretical prediction of structures with desired property values by combining the QSAR method with pattern recognition techniques. In lead optimization, development of QSAR models using various physicochemical descriptors has been a vital task.³⁷ The use of such multiple QSAR models to derive a mechanistic approach can be illustrated by a comparison of the experimental data available on the anticancer agents. Computational methods also aid the rapid generation of new hypotheses, as well as the design and interpretation of hypothesis-driven experiments in the field of cancer research.

A number of quantum chemical descriptors (such as molecular orbital, charge and dipole moments, etc.), electrostatic descriptors (such as charge-based descriptors etc.), geometrical descriptors (such as moment of inertia etc.) and thermodynamic descriptors (such as entropy and vibrational frequency etc.) have been effectively applied to set up QSAR models for predicting activities of compounds.^38–40 There are a large number of cell lines available for any given cancer type, on which in vitro biological activity can be executed, but the results of this prediction differ based on the cell line used for the assay. As a result it becomes complicated for computational chemists to select experimental data from a pool of existing biological activity for a single scaffold type. In vitro experimental data for anticancer activity is available against many different cell lines. In the literature, QSAR studies are carried out mainly for any one particular cell line, which may not be a good approach. This study considered all the available experimental data for many different cell lines to build predictive models, and will aid medicinal chemists to more reliably design new and potent compounds. Analyses of the obtained descriptors for models against all the cell lines, may suggest the significance of a particular type of descriptor in modeling anticancer activity against a cancer type.

2. Computational methods

2.1. Data set for analysis

Reported in vitro assays for 16 different scaffolds against 30 cell lines for a total of 482 compounds were considered for the present investigation (Tables S1–S16 in the ESI†). The inhibitor activities (IC₅₀) against different cell lines were converted to pIC₅₀ according to the formula pIC₅₀ = −log(IC₅₀). The parent structures of all the scaffolds with a number of compounds and the names of the cell lines are reported in Fig. 1. Table 1 represents the name of the scaffolds considered, the different cell lines and the number of molecules corresponding to the cell lines.^20,41–54


	Fig. 1 482 compounds which have IC₅₀ values represented different scaffolds (S1–S16). The number of compounds in each scaffold are in parenthesis and the different cell lines against which the cytotoxicity values were reported (please see Tables S1–S16† in additional file A for the structures of all the compounds with their in vitro IC₅₀ values against various cell lines).

Table 1 Details of the scaffolds considered in this study and the cell lines against which their anticancer activity was reported, along with the number of molecules in each cell lines

S. no.	Scaffold name	Cell lines	Cancer type	No. of compound	Ref.
S1	Acridine	P388	Leukemia	41	41
		LLc	Lung	41
		JLc	Leukemia	41
S2	Cantharidine	HT-29	Colon	35	42
		SW480	Colon	35
		MCF-7	Breast	35
		A2780	Ovarian	35
		H460	Lung	35
		A431	Skin	35
		DU145	Prostate	35
		BE2-C	Neuronal	35
		SJ-G2	Brain	35
S3	Chalcone	ACHN	Renal	19	43
		Pancc1	Pancreatic	19
		Calu1	Lung	19
		H460	Lung	19
		HCT116	Colon	19
S4	Tetrahropyrimidine	MCF-7	Breast	23	44
S5	Isatin	HCT116	Colon	32	45
S5	Isatin	MCF-7	Breast	32	45
S6	Isoflavne	HCT116	Colon	23	46
S7	Nitroalkene	HeLa	Cervical	22	47
S8	Phenazine	H69	Lung	18	48
S9	Podophyllotoxin	HeLa	Cervical	30	49
S9	Podophyllotoxin	MCF7	Breast	30	49
S10	Pyrazole	HeLa	Cervical	17	49
S10	Pyrazole	MCF-7	Breast	17	49
S11	Pyrazoline	MCF-7	Breast	20	50
S11	Pyrazoline	B16-F10	Melanoma	20	50
S12	Pyrimidine	BEL-7402	Heptocellular	37	20
S13	Quinazoline	MCF-7	Breast	36	51
		U251	CNS	36
		SW480	Colon	36
		H522	Lung	36
		M14	Melanoma	36
		SKOV3	Ovarian	36
		DU145	Prostate	36
		A498	Renal	36
S14	Quinoxaline	MCF-7	Breast	22	52
		H460	Lung	22
		SF-268	CNS	22
S15	Semicarbazide	L120	Leukemia	30	53
S16	Stillbene	A549	Lung	69	54
		MCF-7	Breast	69
		HT-29	Colon	69
		SKMEL-5	Melanoma	69
		MLM	Melanoma	69

2.2. Optimization

A total of 482 compounds were collected along with their anticancer activity against 30 cancer cell lines, which belong to 16 different chemical scaffolds (Fig. 1). All the structures were initially optimized and their vibrational frequencies calculated using a semi-empirical AM1 procedure using AMPAC 5.0. Gaussian output files were obtained for each structure, which acted as input files for a CODESSA program for calculating descriptors as well regression analysis.

2.3. Calculating 2D descriptors and regression analysis

CODESSA (COmprehensive DEscriptors for Structural and Statistical Analysis) version 2.0 was used to calculate the 2D descriptors as well as for regression analysis.⁵⁵ Fig. 2 provides a schematic illustration of the workflow accepted for the current study to develop and validate various QSAR models. Initially, approximately 540 default descriptors were calculated and these descriptors were further classified into following groups viz. constitutional, topological, geometrical, electrostatic, quantum-chemical and thermodynamic descriptors.³⁷


	Fig. 2 Flowchart for the methodology accepted for the development and validation of the QSAR models.

Two different schemes were chosen to develop statistically significant QSAR models. In the first scheme, 16 QSAR models were developed for the 16 scaffolds used in this investigation (i.e. scaffold-based QSAR models), whereas in the second scheme, 30 different QSAR models were developed based on the availability of IC₅₀ values against 30 cancer cell lines by combining all the scaffolds (i.e. cell line-based QSAR models). For all the models, intercorrelation of the descriptors was also tested. Then, models containing highly intercorrelated descriptors were replaced and refined so that the descriptors, which were employed in the given models, were practically orthogonal to each other.

Large numbers of descriptors will create confusion and reduce the predictive ability and statistical robustness of the model. So we scrupulously developed 3, 4 and 5 descriptor-based models for all the sets of compounds to find out the minimum number of descriptor defining activity, with the help of a heuristic method which is a multilinear regression method. This method is better than other methods due to its high speed. This method usually produces correlation 2–5 times faster than other methods with comparable quality and it has no restriction on the size of the data set. On comparison with four and five descriptor-based models, three descriptor-based models were found to be satisfactory for all sets of compounds. To assess the statistical quality of the models, various parameters like R², R_cv², AE, S², F and t-test are essential, which were obtained from the correlation of approximately 540 descriptors (constitutional, geometrical, topological, electrostatic, thermodynamic and quantum chemical etc.) in different combinations.⁵⁶ The R² value is a relative measure of the quality of fit, F represents the F-ratio between the variance of the calculated and experimental activity, and the t-test reflects the significance of the parameter within the model. The effect of the number of descriptors on the correlation coefficient was examined on the set of molecules using a heuristic method at 1–10 descriptors.

3. Results & discussion

By using IC₅₀ values as dependent variables and deliberated properties as independent variables, regression was executed for QSAR analysis of various developed models. It would be suitable to obtain insight into the physical meaning of the correlation obtained as an output of the regression analysis. To improve the anticancer activity of molecules, the magnitude of a descriptor could be used as a guideline.

Among the developed models, sixteen and thirty models were selected on the basis of several statistical and other parameters such as R², R_cv², S², AE (average residual) values, Fischer’s values (F test) and a t-test. The relationship between the number of descriptors and the correlation values for all models were determined by correlating 1–10 descriptors individually, as shown in Fig. 4(a) and (b) for cell line-based models and scaffold-based models respectively. Among all the models, three descriptor models were acceptable for getting the best correlation because models with more than six descriptor may give high correlation values, which may be phony and may not be constructive for the further prediction of biological activities.


	Fig. 3 (a) Plots between experimental and predicted IC₅₀ values for cell line-based QSAR models with correlation coefficients and cross validation coefficients for 15 high quality statistical models. (b) Plots between experimental and predicted IC₅₀ values for scaffold-based QSAR models with correlation coefficients and cross validation coefficients for 8 high quality statistical models.


	Fig. 4 (a) The effect of the number of descriptors on the correlation coefficient on the basis of cell line-based QSAR models. (b) The effect of the number of descriptors on the correlation coefficient on the basis of scaffold-based QSAR models.

All the models were separated into training sets and test sets. Developed models, which were constructed using training set compounds, were used to determine the activity of test set compounds. Lower average residual values obtained from both the training and test set indicate which models have a high potential to establish the correlation between the structure and activity.

Most of the scaffold-based QSAR models along with regression equations, cancer type and the names of the cell lines are given in Table 2(a). We obtained superior statistically quality for most of the scaffold-based QSAR models with higher correlation coefficient values than the cell line-based models. The lower number of compounds is an important reason for the high correlation coefficients of these models. The range of activity of the compounds in three (S2, S10 and S12) models is poor. In comparison, models containing broad activity range compounds show high correlation coefficients, while narrow activity range compounds show lower correlation coefficient values. Besides these models, all the scaffold-based models with high correlation coefficient values seen rational and can be used for further prediction.

Table 2 (a) Cell line with the types of cancer in parenthesis, the scaffolds involved, regression summary (regression equation, correlation coefficient R², cross validation coefficient R_cv² and average residual) and the number of compounds (training set, TR, and test set, TS) in various scaffold-based QSAR models. (b) The cell line with the type of cancer in parenthesis, the scaffolds involved, regression summary (regression equation, correlation coefficient R², cross validation coefficient R_cv², average residual AE) and the number of compounds (training set, TR, test set, TS) in various cell line-based QSAR models^a

No.	Cell lines (type)	Regression equation	R²	R_cv²	AE	F	S²	# comp
No.	Cell lines (type)	Regression equation	R²	R_cv²	AE	F	S²	TR	TS
a R² is the square of the correlation coefficient and represents the statistical significance of the model. R_cv² is the cross-validated R², a measure of the quality of the QSAR model. AE is the average of absolute difference between the experimental and calculated IC₅₀ values. F is the Fischer statistic, the ratio between the explained and unexplained variance for a given number of degrees of freedom, thereby indicating a factual correlation or the significance level for the QSAR models. S² is the standard deviation. TR is number of molecules in training set and TE is test set molecules.
S1	P388 (leukemia)	=−6.2155 × VE/T + 2.3164 × WPSA3Q + 3.3250 × LNMVF + 1.5252	0.75	0.67	0.35	26.74	0.145	31	9
S2	HT29 (colon)	=1.8444 × RNB − 3.3083 × MaenAC + 0.13180 × PMIA + 6.1097	0.69	0.55	0.12	15.36	0.033	27	8
S3	ACHN (renal)	=7.4622 × FPSA3z − 7.8674 × WNSA2z − 1.0224 × BI + 1.8722	0.98	0.95	0.05	105.92	0.001	15	4
S4	MCF7 (breast)	=−2.0431 × PPSA3z + 2.7466 × ZXS/ZXR − 2.0527 × RNCGQ + 3.5395	0.89	0.73	0.09	29.10	0.009	17	5
S5	HCT116 (colon)	=4.1330 × RNCl − 2.1896 × RNCSz − 1.2796 × FNSA2Q − 2.7626	0.77	0.66	0.21	21.31	0.028	23	8
S6	HCT116 (colon)	=4.0785 × EMiNACC − 4.1004 × PNSA2Q − 1.4298EHBCAQ + 2.0450	0.88	0.82	0.14	31.98	0.018	18	5
S7	HeLa (cervical)	=−9.3376 × PMIB − 2.0744 × EHDSAQ + 1.7527 × EMaNACC + 3.8717	0.85	0.75	0.14	19.85	0.036	15	4
S8	H69 (lung)	=4.7221 × MaPCHz − 2.5135 × TE/#A-t + 2.6179 × MiERIC	0.96	0.93	0.12	76.93	0.019	14	4
S9	HeLa (cervical)	=2.1519 × MiERIN + 3.4050 × MaREHN − 1.0293 × ABOC − 3.4442	0.93	0.90	0.12	84.32	0.074	22	6
S10	HeLa (cervical)	=9.0910 × MaenACH − 6.2862 × HNMVF + 1.2038 × MaPPBO − 1.5070	0.96	0.90	0.15	50.38	0.051	14	3
S11	B16-F10 (melanoma)	=−2.7430 × 1XGP + 5.4109 × DPSA2z − 6.9408 × EHDSAQ + 3.9304	0.93	0.88	0.11	41.90	0.012	15	5
S12	BEL-7402 (melanoma)	=2.98993 × HLEG + 1.0598 × MaeeRN + 2.9834 × KSI3 − 4.0099	0.70	0.60	0.35	20.86	0.106	32	11
S13	M14 (melanoma)	=4.4562 × MiERIN − 3.6455 × MiAOEP + 1.0863 × MiBON (0.1) + 1.3577	0.65	0.56	0.31	15.05	0.215	28	8
S14	SF-268 (CNS)	=−1.3737 × MaTICC + 1.2771 × EPNSA3Q − 1.2524 × MiTICN + 4.8510	0.74	0.59	0.46	11.22	0.240	17	5
S15	L120 (leukemia)	=1.712 × RNN − 4.0400 × MiERIC + 9.1240 × MIA − 4.5014	0.92	0.89	0.24	71.94	0.055	22	8
S16	A549 (lung)	=−1.2148 × MaenACC + 5.4537 × FNSA1Q − 6.7738 × AIC1	0.48	0.38	0.24	12.46	0.087	49	17

No.	Cell line (type)	Scf.	Regression equation	R²	R_cv²	AE	F	S²	# of comp.
No.	Cell line (type)	Scf.	Regression equation	R²	R_cv²	AE	F	S²	TR	TS
M1	A498 (renal)	S13	=−3.3738 × MannRCN + −1.2453 × ASIC1 − 1.0807 × RNCSz + 4.6355	0.71	0.56	0.62	14.87	0.239	31	5
M2	A549 (lung)	S16	=−1.3066 × MaenACC + 1.1275 × PNSA1Q − 7.9011 × PNSA2z + 3.3541	0.56	0.49	0.27	18.41	0.094	56	10
M3	A2780 (ovarian)	S2	=5.3016 × MiNRIO + 6.1285 × HACA1Q − 1.222 × RPCSz − 3.6341	0.68	0.54	0.22	13.89	0.043	30	5
M4	ACHN (renal)	S3	=7.4622 × FPSA3z − 7.8674 × WNSA2z − 1.0224 × BI + 1.8722	0.96	0.95	0.05	105.93	0.001	16	3
M5	A431 (skin)	S2	=−7.2276 × YZS + 1.0111 × RI0 × +7.4112 × MiERIC + 2.5462	0.69	0.59	0.14	13.43	0.036	30	5
M6	B16-F10 (melanoma)	S11	=−1.0251 × WPSA1Q − 1.7686 × MiTICS + 1.6039 × MiNACN + 2.7567	0.94	0.89	0.11	62.56	0.015	17	3
M7	BE2-C (neuronal)	S2	=−5.0255 × XYS/XYR + 1.4971 × MiERIC − 6.7892 × RNO + 5.2368	0.72	0.63	0.16	16.67	0.030	29	7
M8	BEL-7402 (heptocellular)	S12	=2.6386 × HLEG − 6.0993 × WNSA2Q − 4.8693 × MiERIN − 2.2423	0.58	0.44	0.31	12.34	0.177	35	10
M9	Calu1 (lung)	S3	=2.2933 × A1ERIC − 2.3063 × SIC1 − 3.0798 × YZS + 5.2315	0.93	0.80	0.12	41.25	0.023	16	3
M10	DU145 (prostate)	S3, S13	=7.0109 × MiTICN − 7.8249 × PNSA1z − 2.7692 × CHaSz − 6.9257	0.43	0.32	0.45	11.60	0.307	57	15
M11	H69 (lung)	S8	=−1.1403 × FNSA2z − 5.8440 × ERPCGQ − 5.2628 × MienACH + 3.4678	0.93	0.81	0.13	44.62	0.033	15	3
M12	H522 (lung)	S13	=1.4711 × MiPCCz × +2.3775 × MiBOC(0.1) + 8.7939 × THCMD − 2.1601	0.73	0.63	0.24	18.26	0.201	28	8
M13	HCT116 (colon)	S3, S5, S6	=2.7531 × RE/T + 3.3081 × LNMVF + 64.6887 × ACIC1 − 1.0055	0.59	0.49	0.29	24.10	0.119	60	14
M14	HeLa (cervical)	S7, S9, S10	=−6.4020 × MaBON − 9.5518 × MienACN + 7.3732NN + 3.7878	0.76	0.71	0.40	43.55	0.377	56	11
M15	HT29 (colon)	S2, S16	=−6.4490 × FNSA2Q + 1.3955 × PP/SD + 9.1032 × MaenAC − 1.6224	0.30	0.22	0.27	9.54	0.098	81	20
M16	JLc (leukemia)	S1	=−7.0103 × EMaNACH + 1.5761 × HDSA2Q − 1.0051 × MaeeRC + 1.0507	0.86	0.82	0.33	44.61	0.069	30	11
M17	L120 (leukemia)	S15	=−4.8986 × HDCA2Q + 1.7065 × WNSA2z − 1.4993 × MienANN + 6.9159	0.90	0.84	0.17	43.50	0.066	25	6
M18	LLc (lung)	S1	=−1.5310 × ZXS/ZXR + 2.7870 × ERNCSQ − 6.4016 × MiBOC(0.1) + 7.9845	0.83	0.78	0.38	36.82	0.155	32	9
M19	M14 (melanoma)	S13	=−8.1796 × NN − 5.0035 × RNBr + 1.1723 × MiERIC + 4.9692	0.81	0.70	0.25	28.09	0.156	30	6
M20	MCF7 (breast)	S2, S4, S5, S9, S10, S11, S13, S14, S16	=6.4410 × MaeeRC − 3.4532 × ERPCSQ − 1.7867 × ASIC1 − 2.7106	0.46	0.44	0.55	52.87	0.663	231	45
M21	MLM (glioblastoma)	S16	=−8.2245 × EFPSA1Q + 1.1671 × ANRIO − 4.4003 × EHDSAQ	0.48	0.40	0.28	14.37	0.124	53	13
M22	H460 (lung)	S2, S3, S14	=−1.3004 × MiPC + 6.3227 × Ma1ERIC + 2.4755 × MaenACO	0.59	0.49	0.41	19.82	0.152	60	16
M23	P388 (leukemia)	S1	=−3.4460 × WPSA1z + 6.8634 × MiTICN + 8.1021 × HDCA1Q − 9.9755	0.81	0.73	0.31	31.74	0.0751	32	9
M24	Panc1 (pancreatic)	S3	=1.8296 × SIC2 + 3.3629 × LNMVF − 1.7681 × FPSA3Q − 3.0118	0.87	0.73	0.11	21.63	0.016	16	3
M25	SF-468 (CNS)	S14	=2.4189 × ABOC − 3.9606 × ERNCSQ + 2.4054 × EMaNAC − 2.3761	0.74	0.56	0.27	9.38	0.171	18	4
M26	SJ-G2 (brain)	S2	=5.1327 × CIC2 + 1.5429 × AVN + 2.0716 × MiRECN − 7.3156	0.77	0.60	0.13	18.67	0.034	26	9
M27	SKMEL-5 (melanoma)	S16	=−1.2423 × HOMO1 + 4.6277 × MaTICH − 2.3144 × EFHDSA − 6.8356	0.51	0.45	0.22	15.52	0.111	51	15
M28	SKOV3 (ovarian)	S13	=2.7606 × MiPCNz − 3.5240 × EE + eeRCC + 1.3856 × EFHDCAQ + 5.0366	0.76	0.66	0.28	20.69	0.141	29	7
M29	SW480 (colon)	S2	=7.0573 × MaASEN − 605367 × RNCl − 1.1217 × HDSAQ + 1.3202	0.69	0.	0.27	32.20	0.130	50	15
M30	U251 (CNS)	S13	=−1.1620 × IOKSE + 5.3498 × EFHBSAQ − 1.5086 × RNN + 7.2762	0.76	0.69	0.31	24.27	0.154	29	7

All QSAR models were cross validated by these high R_cv² values, obtained by a leave one out method for validation of the model R_cv², which should be greater than 0.5.⁵⁷ The regression summary for cell line-based QSAR models (M4, M6, M9, M11, M17, M16, M18, M19, M23 and M24) shows a high statistical quality (avg. R² = 0.93, R_cv² = 0.89) and appears valuable for the existing class of compounds. The statistical quality of a few other cell line-based models (M1, M7, M12, M14, M25, M26, M28, M29, M5, M3 and M30) also shows moderate statistical quality (avg. R² = 0.71 and R_cv² = 0.69), and these models can also used for the prediction. However some models (M27, M22, M21, M20, M15, M13, M10, M8 and M2) cannot be used for further prediction because of the narrow statistical quality of these models (avg. R² = 0.58, R_cv² = 0.45). The irrelevant results obtained from these models are probably due to the contribution of a higher number of compounds and 3 to 5 different scaffolds in these models. The increase in the number of descriptors in narrow range activity models is not very effective to improve the statistical quality of models. This shows that the currently used descriptors are not agreeable for developing the structure–activity relationship for these models, and one needs to try and develop additional descriptors. However the involvement of single scaffolds in these models provide a good statistical quality. All the details for the cell line-based models are illustrated in Table 2(b)

The calculated and experimental biological activity with residuals and descriptor values for all models are given in additional file A (Tables S18–S63†). Fig. 3(a) and (b) show the plots between the experimental and calculated activity values for 15 cell lines and 8 scaffold-based QSAR models. Enduring plots are given in additional file A (Fig. S1(a) and (b)†). According to the plots, the average residual for test and training set compounds clearly represents that compounds of the test set are closer to the line compared with the compounds of the training set.

A total of 109 descriptors were used in different combinations for the development of all the QSAR models. Fig. 5 illustrates the percentage of all the types of descriptors involved in the models. This figure shows the importance of quantum chemical descriptors (approx. 62%) followed by electrostatic (13.6%), topological (9.6%), geometrical (5.5%), and thermodynamic and constitutional (both in 4.5%). The intercorrelation of the descriptors for all the developed models has been investigated and explains that the descriptors are rationally orthogonal. In quantum chemical descriptors, charge-based descriptors such as Max n–n repulsion for a C–N bond, Max e–n attraction for a C–C bond and ESP-RPCG relative positive charge etc. are present in approximately 40 (approx. 37%) models. This was followed by valency-based descriptors and bond order-based descriptors, which are present in approx. 23% and 3% respectively. This represents the importance of charged-based, valency-based and bond order-based descriptors.


	Fig. 5 The percentage of various descriptors involved in the QSAR models (see additional file A Table S17† for the details of all descriptors).

The cell lines of different cancer types considered in the current study are presented in additional file A (Table S66†). Among them, 7 cancer types have experimental data for more than one cell line. Thus, comparative statistical significance of various types of cancer has been done and presented in additional file A (Table S66†). Pancreatic cancer (R² = 0.87, R_cv² = 0.73), leukemia (R² = 0.86, R_cv² = 0.80), renal (R² = 0.85, R_cv² = 0.76), cervical (R² = 0.77, R_cv² = 0.71), brain (R² = 0.77, R_cv² = 0.60), lung (R² = 0.76, R_cv² = 0.67) and CNS (R² = 0.75, R_cv² = 0.63) cancers have better statistical values compared with other types of cancer such as colon, breast, ovarian, skin, prostate, neuronal, melanoma and heptocellular etc. (avg. R² = 0.60, avg. R_cv² = 0.51).

4. Conclusion

Our aim in this investigation was to biologically evaluate a series of anticancer agents by methodically modifying the molecule, in order to explore the structure–activity relationships of these scaffolds. A total of 46 QSAR models, 16 different scaffolds and 30 different cell lines respectivel, were built to assess the predictive power of QSAR models, where the number of descriptors is improved from 1 to 10. This study reveals that three descriptor-based models are found to be satisfactory for further prediction and also shows that quantum chemical descriptors are the most important type of descriptor followed by electrostatic, topological, geometrical, thermodynamic and constitutional descriptors. An analogue-based design approach is important for modeling anticancer compounds. Developed models for all experimentally tested compounds contain higher correlation coefficients (R²), higher cross-validation coefficient (R_cv²) values and lower average residuals (AE) values. Cell lines in pancreatic cancer, with average R² = 0.87, followed by cell lines in leukaemia cancer, with average R² = 0.86, provided the best statistical values. Although the derived equation is of restricted validity due to the limited size of the training set, this result may prove fruitful in predicting new anticancer agents with desired activities.

Conflict of interest

The authors have declared no conflict of interest.

Acknowledgements

SS and OPC thank UGC, New Delhi for financial assistant. The support from the Department of Chemistry, Dr H. S. Gour Central University Sagar M.P. India is also acknowledged.

References

A. Frace, C. Loge, S. Gallet, N. Lebegue, P. Carato, P. Chavatte, P. Berthelot and D. Lesieur, J. Enzyme Inhib. Med. Chem., 2004, 19, 541 CrossRef PubMed.
J. B. Gibbs, Cancer research, Science, 2000, 287, 1969 CrossRef CAS.
A. Kamal, Y. V. V. Srikanth, M. N. A. Khan, T. B. Shaik and M. Ashraf, Bioorg. Med. Chem. Lett., 2010, 20, 5229 CrossRef CAS PubMed.
R. J. Abdel-Jalil, E. Q. E. Momani, M. Hamad, W. Voelter, M. S. Mubarak, B. H. Smith and D. G. Peters, Monatsh. Chem., 2010, 141, 251 CrossRef CAS.
P. Hanumantharao and S. V. Sambasivarao, Bioorg. Med. Chem. Lett., 2005, 15, 3167 CrossRef CAS PubMed.
T. K. Olszewski and B. Boduszek, Tetrahedron, 2010, 66, 8661 CrossRef CAS PubMed.
D. J. Newman, G. M. Cragg and K. M. Snader, J. Nat. Prod., 2003, 66, 1022 CrossRef CAS PubMed.
R. Breinbauer, I. R. Vetter and H. Waldmann, Angew. Chem., Int. Ed., 2002, 41, 2879 Search PubMed.
G. M. Cragg, P. G. Grothaus and D. J. Newman, Chem. Rev., 2009, 109, 3012 CrossRef CAS PubMed.
A. L. Harvey, Drug Discovery Today, 2008, 13, 894 CrossRef CAS PubMed.
D. J. Snodin, Toxicol. Lett., 2002, 127, 161 CrossRef CAS.
S. Kraljevic, P. J. Stambrook and K. Pavelic, EMBO Rep., 2004, 5, 837 CrossRef CAS PubMed.
C. P. Adams and V. V. Brantner, Health Aff., 2006, 25, 420 CrossRef PubMed.
Critical Path Opportunities Reports Challenges and Opportunities Report – March, 2004.
C. W. Yap, Y. Xue, Z. R. Li and Y. Z. Chen, Curr. Top. Med. Chem., 2006, 6, 1593 CrossRef CAS.
R. V. Guido, G. Oliva and A. D. Andricopulo, Curr. Med. Chem., 2008, 15, 37 CrossRef CAS.
A. Schwaighofer, T. Schroeter, S. Mika and G. Blanchard, Comb. Chem. High Throughput Screening, 2009, 12, 453 CrossRef CAS.
L. G. Valerio, Toxicol. Appl. Pharmacol., 2009, 241, 356 CrossRef CAS PubMed.
A. Kamal, E. V. Bharathi, M. J. Ramaiah, D. Dastagiri, J. S. Reddy, A. Viswanath, S. N. C. V. L. Farheen Sultana, M. Pal-Bhadra, H. K. Srivastava, G. N. Sastry, A. Juvekar, S. Sen and S. Zingde, Bioorg. Med. Chem., 2010, 18, 526 CrossRef CAS PubMed.
F. Xie, H. Zhao, L. Zhao, L. Lou and Y. Hu, Bioorg. Med. Chem. Lett., 2009, 19, 275 CrossRef CAS PubMed.
Y. C. Martin, Perspect. Drug Discov. Des., 1998, 12, 3 CrossRef.
U. Norinder, Perspect. Drug Discov. Des., 1998, 12, 25 CrossRef.
D. J. Maddalena, Expert Opin. Ther. Pat., 1998, 8, 249 CrossRef CAS.
H. Kubinyi, Drug Discovery Today, 1997, 2, 538 CrossRef CAS.
C. Hansch and T. Fujita, J. Am. Chem. Soc., 1995, 606, 1 CAS.
C. Hansch and A. Leo, J. Am. Chem. Soc., 1995, 1, 1 Search PubMed.
R. Benigni and A. Giuliani, Bioinformatics, 2003, 19, 1194 CrossRef CAS PubMed.
C. Hansch, A. Leo, S. B. Mekapati and A. Kurup, Bioorg. Med. Chem., 2004, 12, 3391 CrossRef CAS PubMed.
H. K. Srivastava, M. Chourasia, D. Kumar and G. N. Sastry, J. Chem. Inf. Model., 2011, 51, 558 CrossRef CAS PubMed.
M. T. Cronin and J. C. Dearden, Quant. Struct.–Act. Relat., 1995, 14, 117 CrossRef CAS PubMed.
M. Mwense, X. Z. Wang, F. V. Buontempo, N. Horan, A. Young and D. Osborn, SAR QSAR Environ. Res., 2006, 17, 53 CrossRef CAS PubMed.
M. Zhao, Z. Li, Y. Wu, Y. R. Tang, C. Wang, Z. Zhang and S. Peng, Eur. J. Med. Chem., 2007, 42, 955 CrossRef CAS PubMed.
A. S. Reddy, S. P. Pati, P. P. Kumar, H. N. Pradeep and G. N. Sastry, Curr. Protein Pept. Sci., 2007, 8, 329 CrossRef CAS.
F. A. Pasha, M. Muddassar and S. J. Cho, Chem. Biol. Drug Des., 2009, 73, 292 CAS.
H. K. Srivastava, F. A. Pasha and P. P. Singh, Int. J. Quantum Chem., 2005, 103, 237 CrossRef CAS PubMed.
P. Srivani and G. Narahar Sastry, J. Mol. Graphics Modell., 2009, 27, 676 CrossRef CAS PubMed.
S. Janardhan, P. Srivani and G. Narahari Sastry, QSAR Comb. Sci., 2006, 25, 860 CAS.
P. Sivaprakasam, A. Xie and R. J. Doerksen, Bioorg. Med. Chem., 2006, 14, 8210 CrossRef CAS PubMed.
J. C. Chen, Y. Shen, S. Y. Liao, L. M. Chen and K. C. Zheng, Int. J. Quantum Chem., 2007, 107, 1468 CrossRef CAS PubMed.
S. Zhang, L. Wei, K. Bastow, W. Zheng, A. Brossi, K. H. Lee and A. Tropsha, J. Comput.-Aided Mol. Des., 2007, 21, 97 CrossRef CAS PubMed.
R. Garg, W. A. Denny and C. Hansch, Bioorg. Med. Chem., 2000, 8, 1835 CrossRef CAS.
T. A. Hill, S. G. Stewart, S. P. Ackland, J. Gilbert, B. Sauer, J. A. Sakoff and A. McCluskey, Bioorg. Med. Chem., 2007, 15, 6126 CrossRef CAS PubMed.
B. P. Bandgar, S. S. Gawande, R. G. Bodade, J. V. Totre and C. N. Khobragade, Bioorg. Med. Chem., 2010, 18, 1364 CrossRef CAS PubMed.
R. K. Yadlapalli, O. P. Chourasia, K. Vemuri, M. Sritharan and R. S. Perali, Bioorg. Med. Chem. Lett., 2012, 22, 2708 CrossRef CAS PubMed.
M. Kumar, K. Ramasamy, V. Mani, R. K. Mishra, A. B. Abdul Majeed, E. D. Clercq and B. Narasimhan, Arabian J. Chem., 2014, 7, 396 CrossRef CAS PubMed.
J. Hyun, S. Y. Shin, K. M. So, Y. H. Lee and Y. Lim, Bioorg. Med. Chem. Lett., 2012, 22, 2664 CrossRef CAS PubMed.
R. Mohan, N. Rastogi, N. Irishi, N. Namboothiri, S. M. Mobinc and D. Pandaa, Bioorg. Med. Chem., 2006, 14, 8073 CrossRef CAS PubMed.
J. C. Chen, L. Qian, W. J. Wu, L. M. Chen and K. C. Zheng, J. Mol. Struct.: THEOCHEM, 2005, 756, 167 CrossRef CAS PubMed.
I. V. Magedov, L. Frolova, M. Manpadi, U. D. Bhoga, H. Tang, N. M. Evdokimo, O. George, K. H. Georgiou, S. Renner, M. Getlic, T. L. Kinnibrugh, M. A. Fernandes, S. V. slambrouck, W. F. A. Steelant, C. B. Shuster, S. Rogelj, W. A. L. van Otterlo and A. Kornienko, J. Med. Chem., 2011, 23, 4234 CrossRef PubMed.
H. H. Wang, K. M. Qiu, H. E. Cui, Y. S. Yang, Y. Luo, M. Xing, X. Y. Qui, L. F. Bai and H. L. Zhu, Bioorg. Med. Chem., 2013, 21, 448 CrossRef CAS PubMed.
V. M. Sharma, P. Prasanna, K. V. Adi Seshu, B. Renuka, C. V. Laxman Rao, G. Sunil Kumar, C. Prasad Narasimhulu, P. Aravind Babu, R. C. Puranik, D. Subramanyam, A. Venkateswarlu, S. Rajagopal, K. B. Sunil Kumar, C. Seshagiri Rao, N. V. S. Rao Mamidi, D. S. Deevi, R. Ajaykumarb and R. Rajagopalanb, Bioorg. Med. Chem. Lett., 2002, 12, 2303 CrossRef CAS.
B. Zarranz, A. Jaso, I. Aldana and A. Monge, Bioorg. Med. Chem., 2004, 12, 3711 CrossRef CAS PubMed.
S. Ren, R. Wang, K. Komatsu, P. B. Krause, Y. Zyrianov, C. E. McKenna, C. Csipke, Z. A. Tokes and E. J. Lien, J. Med. Chem., 2002, 45, 410 CrossRef CAS PubMed.
M. Cushman, D. Nagarathnam, D. Gopal, A. K. Chakraborti, C. M. Lin and E. Hamel, J. Med. Chem., 1991, 34, 2579 CrossRef CAS.
(a) CODESSA version 2.0, Semichem, 7204 Mullen, Shawnee, KS 66216 USA Search PubMed; (b) M. Karelson, V. S. Lobanov and A. R. Katritzky, Chem. Rev., 1996, 96, 1027 CrossRef CAS PubMed.
M. H. Bohari, H. K. Srivastava and G. N. Sastry, Org. Med. Chem. Lett., 2011, 1, 3 CrossRef PubMed.
A. Golbraikh and A. Tropsha, J. Mol. Graphics Modell., 2002, 20, 269 CrossRef CAS.

Footnote

† Electronic supplementary information (ESI) available. See DOI: 10.1039/c5ra18295f

Click here to see how this site uses Cookies. View our privacy policy here.