An improved large-scale prediction model of CYP1A2 inhibitors by using combined fragment descriptors

Xianchao Pan; Li Chao; Sujun Qu; Shuheng Huang; Li Yang; Hu Mei

doi:10.1039/C5RA17196B

View PDF VersionPrevious ArticleNext Article

DOI: 10.1039/C5RA17196B (Paper) RSC Adv., 2015, 5, 84232-84237

An improved large-scale prediction model of CYP1A2 inhibitors by using combined fragment descriptors†

Xianchao Pan^ab, Li Chao^b, Sujun Qu^b, Shuheng Huang^b, Li Yang^ab and Hu Mei*^ab
^aKey Laboratory of Biorheological Science and Technology, Ministry of Education, Chongqing University, Chongqing 400044, China. E-mail: meihu@cqu.edu.cn; Fax: +86-23-65112677; Tel: +86-23-65102507
^bCollege of Bioengineering, Chongqing University, Chongqing 400044, China

Received 25th August 2015 , Accepted 24th September 2015

First published on 28th September 2015

Abstract

CYP1A2, an important member of the cytochromes P450 (CYPs) superfamily, is involved in the metabolism or bioactivation of many clinical drugs and precarcinogens. Thus, accurate prediction of CYP1A2 inhibitors is of great importance in early drug discovery and cancer prevention. In this study, a dataset of more than 12 [thin space (1/6-em)] 000 structurally diverse compounds was used to develop prediction models by a support vector machine (SVM). By combining two types of fragment descriptors, i.e. Molecular Hologram and MACCS descriptors, an improved radial basis function (RBF)-based SVM model was obtained, of which the accuracies (ACCs), sensitivities (SENs), specificities (SPEs), and Matthews correlation coefficients (MCCs) were 90.95%, 92.40%, 89.70%, 0.8191 for 6396 training samples, and 83.14%, 85.17%, 81.41%, 0.6638 for 6395 test samples, respectively. The prediction capability of the SVM model obtained was further validated by an independent dataset of 2581 samples with geometric mean (G-mean) based accuracy of 70.67%. The results indicate that the combination of the two types of fragment descriptors is an extremely efficient method for eliciting the key structural features of CYP inhibitors, and thus can be employed to large-scale virtual screening of inhibitors of CYP isoforms.

Introduction

Cytochromes P450 (CYPs) are a superfamily of heme enzymes with about 60 isoforms in humans.¹ It has been proved that CYPs are not only involved in oxidative metabolism of a wide range of endogenous and xenobiotic compounds, but also in the occurrence of clinically adverse drug–drug interactions (DDIs).² Among the CYP isoforms, CYP1A2, CYP2C9, CYP2C19, CYP2D6, CYP2E1, and CYP3A4 are of particular importance for drug metabolism, which catalyze the oxidative metabolism of approximately 90% of the currently marketed drugs.³ The broad substrate diversity makes CYPs particularly prone to be inhibited by a large number of drugs, resulting in decreased clearance and increased toxicities of co-administered drugs.⁴ Therefore, accurate prediction of CYP inhibitory activities and potential DDIs of drug candidates is extremely important for early drug discovery.

In human liver, CYP1A2 accounts for about 13% of total CYP content⁵ and metabolizes a variety of clinical drugs (e.g. clozapine, ropivacaine, olanzapine, theophylline, and terbinafine).^2,6 In the past decade, in silico methods in particular quantitative structure–activity relationship (QSAR) have been increasingly attractive for prediction of potential CYP1A2 inhibitors and associated DDIs.^7–14 However, the extrapolation capabilities of the available prediction models are restricted by small datasets and limited structural diversities.

In 2009, Veith et al. determined the AC50 values (half-maximal activity concentration) of 17 [thin space (1/6-em)] 143 compounds against 5 CYP isoforms (1A2, 2C9, 2C19, 2D6, and 3A4) by quantitative high throughput screening (qHTS) technique.¹⁵ Based on this large dataset, various prediction models of CYP inhibitors have been developed by support vector machine (SVM), decision tree (DT), k-nearest neighbor (k-NN), naïve Bayes (NB), and random forest (RF), respectively.^16–19 One of the most interesting works comes from Cheng et al.,¹⁶ who established combined classifiers for predicting CYP inhibitors by using SVM, C4.5DT, NB, and k-NN algorithms. For CYP1A2 isoform, the 5-fold cross-validation (CV) accuracies of the combined classifiers are approximately 81% for 12 [thin space (1/6-em)] 099 training samples (5663 inhibitors/6436 noninhibitors), and the prediction accuracies for 2804 test samples (1752 inhibitors/1052 noninhibitors) range from 70% to 73%.¹⁶ This is the first time, to the best of our knowledge, that highly predictive models have been constructed based on this large dataset. However, the method of combined classifiers is somewhat complex and time-consumed for large-scale virtual screening.

Herein, a strategy of combination of Molecular Hologram and MACCS has been successfully applied to construct prediction models for CYP1A2 inhibitors. The results showed that a predictive RBF (radial basis function)-SVM model was achieved with the accuracies of 90.95% for 6396 training samples and 83.14% for 6395 test samples. The prediction capability of the RBF-SVM model was further validated by an independent dataset of 2581 samples with geometric mean based accuracy of 70.67%. Taken together, the strategy of the combined fragment descriptors provides an extremely simple, accurate, and efficient approach for predicting CYP1A2 inhibitors and can be further applied for predicting inhibitors of other CYP isoforms.

Materials and methods

Datasets

The CYP1A2 dataset for model development was derived from the PubChem BioAssay database (AID: 1851).¹⁵ The inhibitory activities (AC50) were measured by a standard protocol (http://pubchem.ncbi.nlm.nih.gov/assay/assay.cgi?aid=1851). Then, the AC50 of each compound was converted to an activity score between 0 and 100. Compounds are considered as inhibitors if the activity scores larger than 40 and noninhibitors if equal to 0. Compounds with intermediate activity scores (1–39) are inconclusive and thus removed from the dataset.

According to this criterion, 13 [thin space (1/6-em)] 256 compounds with inhibitor/non-inhibitor labels of CYP1A2 were extracted from the database, of which 6000 compounds were determined as inhibitors and 7256 as non-inhibitors. Prior to further analysis, the molecular structures were pretreated by Verify 2D module of Sybyl 8.1.²⁰ After removing inorganic compounds, counter ions, duplicates, salts, and mixtures, the resulting 12 [thin space (1/6-em)] 791 compounds (designated as Dataset I) were then randomly split into a training set containing 6396 compounds and a test set containing 6395 compounds.

An independent validation dataset containing 8465 compounds (4446 inhibitors/4019 noninhibitors) was collected from another PubChem BioAssay database (AID: 410). After filtering structures as mentioned above and removing duplicated structures to Dataset I, 2581 qualified compounds (designated as Dataset II) were obtained. According to the SLN strings of molecules, the duplicate compounds have been removed by using ligand preparation tools in Sybyl 8.1.²⁰ The statistical descriptions of the training, test and validation sets are shown in Table 1. The PubChem ID, SMILES, and inhibitor/non-inhibitor labels of all 15 [thin space (1/6-em)] 372 samples are listed in Table S1.†

Table 1 The statistic descriptions of 15 [thin space (1/6-em)]

372 unique compounds

Datasets	Type	No. of inhibitors	No. of noninhibitors	Sum	Ratio (inhibitors/noninhibitors)
Dataset I (AID: 1851)	Training set	2948	3448	6396	0.8550:1
Dataset I (AID: 1851)	Test set	2947	3448	6395	0.8547:1
Dataset II (AID: 410)	Validation set	1774	807	2581	2.1983:1
Total		7669	7703	15372

Method of structural description

In the last decades, fragment descriptors have shown prominent computational efficiency and easy interpretation in large-scale virtual screening researches.^21,22 In this study, two types of fragment descriptors, i.e. Molecular Hologram and MACCS were used for developing prediction models of CYP1A2 inhibitors.

Molecular Hologram description was carried out by Hologram QSAR (HQSAR) module of Sybyl 8.1 package.^20,23 A Molecular Hologram is an array containing counts of molecular fragments. As depicted in Fig. 1, molecules are first broken into pre-defined structural fragments (including branched, cyclic, and overlapping fragments). Then, each unique fragment is assigned a specific large integer by means of cyclic redundancy check (CRC) algorithm. Each integer corresponds to a bin in an integer array of fixed length L. Bin occupancies are incremented according to the fragments generated. Thus, all generated fragments are hashed into array bins in the range 1 to L. This array is so-called Molecular Hologram, and bin occupancies are the hologram descriptors, which contain topological and compositional molecular information. The generation of Molecular Hologram is mainly determined by 3 parameters: fragment size, fragment distinction, and hologram length.


	Fig. 1 Generation of Molecular Hologram.¹⁹

MACCS descriptors, also called 166-bit MDL keys, use a dictionary that consists of 166 pre-defined substructure fragments.²⁴ For a molecule, each bit represents the presence or absence of a certain atom type, bond type, atom environment, group, or property. If a specified substructure is presented in a given molecule, the corresponding bit is set to ‘1’; conversely, it is set to ‘0’. Thus, each molecule is described as a binary string to represent structural features. The substructure dictionary of MACCS keys is freely available in OpenBabel (http://openbabel.org/).

SVM modeling

In the past two decades, as one of excellent machine learning algorithms, SVM has been successfully applied to establish the prediction models of CYP inhibitors with satisfied accuracies.^{16–19,25,26} In SVM classification, the original data points are firstly projected into a high-dimensional feature space by linear or non-linear kernel-functions and then classified by constructing a hyper-plane in the feature space. In this study, both linear- and RBF-kernel SVM modeling were performed by using LIBSVM v2.9 package.²⁷ All variables were scaled linearly to the range of [0, 1] before SVM modeling. The kernel parameter γ and error penalty parameter C were fine-tuned by using grid search strategy and 10-fold cross-validations.

Assessment of model quality

The prediction performance of SVM models was assessed by sensitivity (SEN), specificity (SPE), overall accuracy (ACC), and Matthews correlation coefficient (MCC), the definitions of which are shown in eqn (1)–(4).


	(1)


	(2)


	(3)


	(4)

Here, TP (true positives) is the number of inhibitors predicted as inhibitors, TN (true negatives) the number of non-inhibitors predicted as non-inhibitors, FP (false positives) the number of noninhibitors predicted as inhibitors, and FN (false negatives) the number of inhibitors predicted as noninhibitors. The value of MCC ranges from −1 to 1. A value of 1 indicates perfect agreement between predicted and observed classes, whereas −1 indicates the worst possible prediction.

Results and discussion

Chemical space and structure diversity

The chemical space of the samples in Dataset I was explored by principal component analysis (PCA) based on 25 pharmacophore and physicochemical descriptors (Table S2†), which characterize molecule volume, shape, electronic, hydrophobic, and H-bond receptor/donor properties. All descriptors were auto-scaled prior to PCA analysis. A total of 4 significant components were obtained by PCA analysis. The first two components explained 25.2% and 18% of the variances of dataset, respectively.

The distribution in the first two principal components of the samples is shown in Fig. 2. It can be seen that both the training and test samples cover the most chemical space within 95% confidence interval and that only a minority of the samples are outside the 95% confidence interval. This indicates that the training and test samples have similar chemical distributions. Besides, no significant difference is detected in the distributions between the inhibitors and the noninhibitors.


	Fig. 2 The first 2 principal component scores of the samples in Dataset I (AID: 1851). Blue cross: inhibitors in the training set; blue diamond: noninhibitors in the training set; red cross: inhibitors in the test set; red diamond: noninhibitors in the test set. The oval-shaped curve: the 95% confidence interval of Hotelling's T².

Parameters optimization for Molecular Holograms

Before generation of Molecular Holograms, the three important parameters, namely fragment size, fragment distinction, and hologram length, are optimized by partial least squares discriminant analysis (PLS-DA) build-in HQSAR module of Sybyl 8.1 package.²⁰ According to our experience, fragment size of 4–7 is optimal in most cases, which not only covers the most important chemical groups but also decreases the number of fragments. Then, the optimal combination of fragment distinction and hologram length was systematically examined according to the performance of PLS-DA models. In PLS-DA modeling, all variables were auto-scaled and the number of principal components was determined by 10-fold cross-validations.

The best 14 PLS-DA models established by different parameter combinations are shown in Table 2. It can be seen that no significant difference in the overall performance is observed among the 14 models. Herein, model 8 with the highest ACCs and MCCs for both the training and test sets is selected as the best PLS-DA model, of which the fragment size is 4–7, the fragment distinction is A/B/Ch/DA, and the hologram length is 401 bins. Thus, the Molecular Hologram descriptors of model 8 were used for the following SVM modeling.

Table 2 Performance of the best 14 PLS-DA models

Model	Fragment distinction^a	Hologram length	Training set				Test set I
Model	Fragment distinction^a	Hologram length	SEN (%)	SPE (%)	ACC (%)	MCC	SEN (%)	SPE (%)	ACC (%)	MCC
a A: atom types; B: bond types; C: connectivity; Ch: chirality; H: hydrogens; DA: H-bond donor and acceptor.
1	A/B/C	401	76.29	81.12	78.89	0.5748	73.97	78.86	76.61	0.5288
2	A/B/C/H	401	79.65	75.41	77.36	0.5488	78.52	72.45	75.25	0.5082
3	A/B/C/Ch	401	75.20	82.05	78.89	0.5745	72.38	79.32	76.12	0.5186
4	A/B/C/H/Ch	353	80.33	75.00	77.45	0.5516	79.30	70.74	74.68	0.4994
5	A/C/Ch/DA	401	74.49	82.08	78.58	0.5681	72.11	80.63	76.70	0.5300
6	A/B/C/DA	257	72.76	81.96	77.72	0.5506	69.63	79.15	74.76	0.4907
7	A/B/C/Ch/DA	401	74.86	81.24	78.30	0.5625	71.19	77.44	74.56	0.4872
8	A/B/Ch/DA	401	76.97	81.67	79.50	0.5871	75.26	78.60	77.06	0.5385
9	A/B/H/Ch	401	79.82	76.65	78.11	0.5630	78.15	73.06	75.40	0.5105
10	A/B/H/DA	401	77.78	77.87	77.83	0.5554	77.06	74.91	75.90	0.5182
11	A/B/H/Ch/DA	401	78.22	78.02	78.11	0.5612	77.33	75.09	76.12	0.5227
12	A/B/C/H/Ch/DA	401	79.27	75.93	77.47	0.5504	78.08	72.80	75.23	0.5072
13	A/C/H/Ch/DA	307	77.68	78.92	78.35	0.5651	74.55	75.90	75.28	0.5037
14	A/C/H/DA	401	77.88	78.07	77.99	0.5584	75.98	74.88	75.39	0.5072

SVM classification models established by each of the fragment descriptors

First, the 401-bin Molecular Hologram descriptors and 166-bit MACCS descriptors were used for SVM modeling separately. The performance of the optimal RBF- and linear-kernel SVM models is shown in Table 3. Overall, all SVM models show high ACCs for both the training and test sets, which range from 77% to 83%. By comparison, the MACCS models outperform the Molecular Hologram models. Meanwhile, the ACCs of RBF-SVM models are slightly higher than that of linear-SVM models for both the training and test sets. Thus, the best SVM model is the MACCS model with RBF kernel, of which the ACCs are larger than 80% for both the training and test sets. Also, it can be seen that all the 4 SVM models outperform the best PLS-DA model, which demonstrates superiority of SVM modeling method.

Table 3 Performance of SVM models established by each of the fragment descriptors

Model	Description method	Training set				Test set
Model	Description method	SEN (%)	SPE (%)	ACC (%)	MCC	SEN (%)	SPE (%)	ACC (%)	MCC
a RBF kernel, C = 30.8022, γ = 1.0000.b Linear kernel, C = 39.5508.c RBF kernel, C = 16.4872, γ = 1.3591.d Linear kernel, C = 37.1545.
RBF-SVM^a	Molecular Holograms	85.31	79.09	81.96	0.6421	82.59	73.72	77.81	0.5620
Linear-SVM^b	Molecular Holograms	83.21	78.74	80.80	0.6176	80.52	74.07	77.04	0.5444
RBF-SVM^c	MACCS	84.70	79.90	82.11	0.6441	84.19	76.54	80.06	0.6056
Linear-SVM^d	MACCS	83.31	80.48	81.79	0.6361	82.80	77.18	79.77	0.5979

SVM classification models established by the combined fragment descriptors

Molecular Hologram and MACCS are two types of fragment descriptors. A Molecular Hologram is an array containing counts of molecular fragments, and it reflects a many-to-one relationship between fragments and bins. The MACCS fingerprint uses a pre-defined dictionary of structural features and denotes their presence or absence by ‘1’ or ‘0’. Therefore, MACCS keys reflect a one-to-one relationship between features and bits. In a sense, Molecular Hologram puts emphasis on fragment types, while MACCS on atom features and environments. Thus, the two types of fragment descriptors may be, to some degree, complementary to each other, and can be combined to enhance model's prediction power.

Just as expected, the overall prediction performance of SVM models significantly increases after introducing the combined descriptors (Table 4). Especially for the RBF-SVM model, the ACCs and MCCs are strikingly high for both the training (90.95%, 0.8191) and test sets (83.14%, 0.6638). The results indicate that the combination of the two fragment descriptors with different types can effectively enhance the prediction performance. In comparison with earlier researches on CYP1A2 inhibitors, our RBF-SVM model is clearly more predictive and simple (Table 4).

Table 4 Performance of the SVM models established by the combined descriptors

Modeling methods	No. of training samples	Training set				No. of test samples	Test set
Modeling methods	No. of training samples	SEN (%)	SPE (%)	ACC (%)	MCC	No. of test samples	SEN (%)	SPE (%)	ACC (%)	MCC
a RBF kernel; C = 42.1016, γ = 2.2408.b Linear kernel, C = 50.7842.c The optimal combined model designed by Cheng et al.¹⁶ were based on SVM and k-NN algorithms and the performance was evaluated by 5-fold cross-validation.d The presented performance of the SVM model¹⁸ was obtained based on the training and the test sets assembled by random sampling strategy in the study.e The accuracy for the training set was evaluated by 7-fold cross-validation, and the accuracy for the test set was measured by the area under the curves (AUC) of the receiver operating characteristic (ROC).¹⁷f ASNN: Associative neural networks.²⁶
RBF-SVM in this study^a	6396	92.40	89.70	90.95	0.8191	6395	85.17	81.41	83.14	0.6638
Linear-SVM in this study^b	6396	87.25	83.56	85.26	0.7060	6395	84.32	78.71	81.30	0.6284
Cheng et al. combined model II^c	12099	80.00	82.50	81.30	0.6260	2804	—	—	72.00	—
Su et al. SVM^d	10238	—	—	—	—	2559	86.80	74.00	79.80	—
Sun et al. RBF-SVM^e	7208	—	—	87.50	—	7128		0.93 (AUC)
Novotarskyi et al. ASNN^f	3745	—	—	—	—	3741	0.827	0.827	0.827	0.6530

Recently, Lapins et al.²⁵ developed a unified proteochemometric (PCM) model successfully for predicting inhibitors of five major CYP isoforms, i.e. CYP1A2, CYP2C9, CYP2C19, CYP2D6 and CYP3A4. Based on the signature information of the inhibitors and amino acid property composition and transition information in the CYP primary sequences, high cross-validated accuracies (78–85%) and prediction accuracies for an external dataset (79–88%) were obtained by using SVM, k-nearest neighbor, and random forest classifiers. For CYP1A2 inhibitors, however, it can be seen from the ROC curve that high sensitivity for the external validation dataset can be achieved only at a cost of low specificity. In comparison with our models, the PCM models established on about 80 [thin space (1/6-em)] 000 atomic signatures and amino acid property transition information is somewhat complex and less interpretable.

In order to further validate the prediction and extrapolation capabilities of the SVM models, an independent validation set (Dataset II) containing 2581 diverse compounds (1774 inhibitors and 807 noninhibitors) was introduced. As observed in Table 5, the two SVM models with different kernel functions achieve modest ACCs (∼65%). Although the SENs of both models are somewhat limited, the SPEs and ACCs are still satisfying. The low SENs may be explained by the different experimental protocols and labeling methods applied for the two databases. For this imbalanced dataset, the geometric mean (G-mean) based ACC was also introduced to measure the predictive power. It can be seen that the G-mean of ∼70% is relatively high for this imbalanced dataset.

Table 5 Performance of the SVM models on the independent validation set (2581 samples)

Model	Description method	SEN (%)	SPE (%)	ACC (%)	MCC	G-mean^a (%)
a
RBF-SVM	Combined	57.33	87.11	66.64	0.4156	70.67
	MACCS	53.27	83.89	62.84	0.3494	66.85
	Molecular Holograms	54.90	84.76	64.24	0.3719	68.22
Linear-SVM	Combined	54.40	86.12	64.32	0.3809	68.45
	MACCS	52.03	85.01	62.34	0.3498	66.51
	Molecular Holograms	52.82	83.02	62.26	0.3371	66.22

Furthermore, according to the weight coefficients of variables in the RBF-SVM model, variable screening was also performed. The results showed that no significant improvement was observed with the decreased number of fragment descriptors (Fig. 3).


	Fig. 3 The performance of the combined descriptors based RBF-SVM model for the (a) training set (b) test set and (c) validation set.

Conclusions

In this study, Molecular Hologram and MACCS descriptors were combined to construct SVM classification models for CYP1A2 inhibitors on a large dataset with more than 12 [thin space (1/6-em)]

000 unique compounds. The results show that the prediction performance of the RBF-SVM model based on combined fragment descriptors was remarkably improved with the overall accuracies of 90.95% and 83.14% for the training and test sets, respectively. The SVM models were further validated by an independent dataset of 2581 samples with the G-mean accuracy of ∼70%. The results indicate that the Molecular Hologram and MACCS descriptors are, to some degree, complementary to each other, and can be combined to enhance predictive power effectively. In comparison with the earlier studies, the RBF-SVM model based on the combined descriptors is extremely simple, predictive, and especially suit for large scale virtual screening of CYP1A2 inhibitors. According to this research and our previous researches of CYP2C19 inhibitors, we suggest that the combination of Molecular Hologram and MACCS descriptors can be considered as one preferable method for the virtual screening of inhibitors of other CYP isoforms.

Acknowledgements

This research was supported by the National Natural Science Foundation of China (No. 61073135) and the ‘111’ project of Introducing Talents of Discipline to Universities.

References

D. R. Nelson, D. C. Zeldin, S. M. Hoffman, L. J. Maltais, H. M. Wain and D. W. Nebert, Pharmacogenetics, 2004, 14, 1–18 CrossRef CAS PubMed .
S. F. Zhou, J. P. Liu and B. Chowbay, Drug Metab. Rev., 2009, 41, 89–295 CrossRef CAS PubMed .
D. Singh, A. Kashyap, R. V. Pandey and K. S. Saini, Drug Discovery Today, 2011, 16, 793–799 CrossRef CAS PubMed .
O. Pelkonen, M. Turpeinen, J. Hakkola, P. Honkakoski, J. Hukkanen and H. Raunio, Arch. Toxicol., 2008, 82, 667–715 CrossRef CAS PubMed .
T. Shimada, H. Yamazaki, M. Mimura, Y. Inui and F. P. Guengerich, J. Pharmacol. Exp. Ther., 1994, 270, 414–423 CAS .
I. S. Lee and D. Kim, Arch. Pharmacal Res., 2011, 34, 1799–1816 CrossRef CAS PubMed .
T. Fox and J. M. Kriegl, Curr. Top. Med. Chem., 2006, 6, 1579–1591 CrossRef CAS PubMed .
J. Sridhar, J. Liu, M. Foroozesh and C. L. Stevens, Molecules, 2012, 17, 9283–9305 CrossRef CAS PubMed .
K. K. Chohan, S. W. Paine, J. Mistry, P. Barton and A. M. Davis, J. Med. Chem., 2005, 48, 5154–5161 CrossRef CAS PubMed .
F. Hammann, H. Gutmann, U. Baumann, C. Helma and J. Drewe, Mol. Pharm., 2009, 6, 1920–1926 CrossRef CAS PubMed .
J. Burton, I. Ijjaali, O. Barberan, F. Petitet, D. P. Vercauteren and A. Michel, J. Med. Chem., 2006, 49, 6231–6240 CrossRef CAS PubMed .
K. Roy and P. P. Roy, Expert Opin. Drug Metab. Toxicol., 2009, 5, 1245–1266 CrossRef CAS PubMed .
H. Li, J. Sun, X. Fan, X. Sui, L. Zhang, Y. Wang and Z. He, J. Comput.-Aided Mol. Des., 2008, 22, 843–855 CrossRef CAS PubMed .
M. P. Gleeson, A. M. Davis, K. K. Chohan, S. W. Paine, S. Boyer, C. L. Gavaghan, C. H. Arnby, C. Kankkonen and N. Albertson, J. Comput.-Aided Mol. Des., 2007, 21, 559–573 CrossRef CAS PubMed .
H. Veith, N. Southall, R. Huang, T. James, D. Fayne, N. Artemenko, M. Shen, J. Inglese, C. P. Austin, D. G. Lloyd and D. S. Auld, Nat. Biotechnol., 2009, 27, 1050–1055 CrossRef CAS PubMed .
F. Cheng, Y. Yu, J. Shen, L. Yang, W. Li, G. Liu, P. W. Lee and Y. Tang, J. Chem. Inf. Model., 2011, 51, 996–1011 CrossRef CAS PubMed .
H. Sun, H. Veith, M. Xia, C. P. Austin and R. Huang, J. Chem. Inf. Model., 2011, 51, 2474–2481 CrossRef CAS PubMed .
B. H. Su, Y. S. Tu, C. Lin, C. Y. Shao, O. A. Lin and Y. J. Tseng, J. Chem. Inf. Model., 2015, 55, 1426–1434 CrossRef CAS PubMed .
L. Chao, H. Mei, X. C. Pan, W. Tan, T. F. Liu and L. Yang, Chemom. Intell. Lab. Syst., 2014, 130, 109–114 CrossRef CAS .
Tripos Inc., St. Louis, MO, USA, 2008, available online: http://www.tripos.com.
A. Varnek, Methods Mol. Biol., 2011, 672, 213–243 CrossRef CAS PubMed .
K. Z. Myint and X. Q. Xie, Int. J. Mol. Sci., 2010, 11, 3846–3866 CrossRef CAS PubMed .
T. Hurst and T. Heritage, Tripos Technical Notes, 1997, 1, 1–15 Search PubMed .
J. L. Durant, B. A. Leland, D. R. Henry and J. G. Nourse, J. Chem. Inf. Comput. Sci., 2002, 42, 1273–1280 CrossRef CAS PubMed .
M. Lapins, A. Worachartcheewan, O. Spjuth, V. Georgiev, V. Prachayasittikul, C. Nantasenamat and J. E. Wikberg, PLoS One, 2013, 8, e66566 CrossRef CAS PubMed .
S. Novotarskyi, I. Sushko, R. Korner, A. K. Pandey and I. V. Tetko, J. Chem. Inf. Model., 2011, 51, 1271–1280 CrossRef CAS PubMed .
C. C. Chang and C.-J. Lin, ACM Transactions on Intelligent Systems and Technology, 2010, 2, 1–27 CrossRef .

Footnote

† Electronic supplementary information (ESI) available. See DOI: 10.1039/c5ra17196b

Click here to see how this site uses Cookies. View our privacy policy here.