Fidele
Ntie-Kang
*abc,
Pascal Amoa
Onguéné
d,
Michael
Scharfe
c,
Luc C.
Owono Owono
e,
Eugene
Megnassan
f,
Luc Meva'a
Mbaze
d,
Wolfgang
Sippl
c and
Simon M. N.
Efange
b
aCEPAMOQ, Faculty of Science, University of Douala, P. O. Box 8580, Douala, Cameroon. E-mail: ntiekfidele@gmail.com; Tel: +237 77915473
bChemical and Bioactivity Information Centre, Department of Chemistry, University of Buea, P. O. Box 63, Buea, Cameroon
cDepartment of Pharmaceutical Sciences, Martin-Luther University of Halle-Wittenberg, Wolfgang-Langenbeck Str. 4, 06120, Halle(Saale), Germany
dDepartment of Chemistry, Faculty of Science, University of Douala, P. O. Box 24157, Douala, Cameroon
eLaboratory for Simulations and Biomolecular Physics, Advanced Teachers Training College, University of Yaoundé I, P.O. Box 47, Yaoundé, Cameroon
fLaboratory of Fundamental and Applied Physics, University of Abobo-Adjamé, Abidjan 02 BP 801, Cote d'Ivoire
First published on 28th October 2013
We assess the medicinal value and “drug-likeness” of ∼3200 compounds of natural origin, along with some of their derivatives which were obtained through hemisynthesis. In the present study, 376 distinct medicinal plant species belonging to 79 plant families from the Central African flora have been considered, based on data retrieved from literature sources. For each compound, the optimised 3D structure has been used to calculate physicochemical properties which determine oral availability on the basis of Lipinski's “Rule of Five”. A comparative analysis has been carried out with the “drug-like”, “lead-like”, and “fragment-like” subsets, containing respectively 1726, 738 and 155 compounds, as well as with our smaller previously published CamMedNP library and the Dictionary of Natural products. A diversity analysis has been carried out in comparison with the DIVERSet™ Database (containing 48
651 compounds) from ChemBridge. Our results prove that drug discovery, beginning with natural products from the Central African flora, could be promising. The 3D structures are available and could be useful for virtual screening and natural product lead generation programs.
Po/w or log
P) ≤ 5, the number of hydrogen bond acceptors (HBA) ≤ 10 and the number of hydrogen bond donors (HBD) ≤ 5. An additional rule for the number of rotatable bonds (NRB) is often added to this ro5, such that NRB ≤ 5. An evaluation of “lead-likeness” is often carried out using more stringent criteria, following the “Rule of 3.5” (150 ≤ MW ≤ 350; log
Po/w ≤ 4; HBD ≤ 3; HBA ≤ 6),40–43 and “fragment-likeness” using even more stringent criteria, following the “Rule of 2.5” (MW ≤ 250; −2 ≤ log
Po/w ≤ 3; HBD < 3; HBA < 6; NRB < 3).44Fig. 1 shows the distribution of the number of violations of the ro5 within the ConMedNP library. Plots of the distribution of these physical properties for the total ConMedNP library, along with the standard subsets are shown in Fig. 2, while a summary of the average values of the physical indicators of the ro5 for the various subsets as well as the number of generated tautomers are given in Table 1. The number of rotatable bonds (NRB) within the ConMedNP library was used as an additional criterion to test for the favourable drug metabolism and pharmacokinetics (DMPK) outcomes, based on the observation that naturally occurring compounds often exhibit a wide range of flexibility, from rigid conformationally constrained molecules to very flexible compounds. It was noted that 50.02% of the compounds within ConMedNP were Lipinski compliant and 79.61% showed < 2 violations (Fig. 1), while the maximum in the distribution of the NRB was between 1 and 2 for the total ConMedNP library, as well as for the “drug-like” and “lead-like” subsets (Fig. 2A). It was also noted that the distribution of MW showed a maximum value between 401 and 500 Da (Fig. 2B), showing a curve similar to those previously reported for other “drug-like” NP libraries in the literature,6,33,45 with about 21% of MW > 500 Da. Meanwhile, the “drug-like”, “lead-like” and “fragment-like” subsets showed maxima respectively in the regions 301–400 Da, 201–300 Da and 101–200 Da. The distribution of the calculated log
Po/w values showed a roughly Gaussian shaped curve with a maximum value centred at 3.5 log
Po/w units, while the maxima of the “drug-like”, “lead-like” and “fragment-like” subsets were at 2.5, 1.5 and 0.5 units respectively (Fig. 2C). Some exceptionally large log
Po/w values (up to > 26 units) were observed and could be attributed to the fact that the training database/algorithm used to calculate log
Po/w in MOE may not suit the types and combinations of functional groups found in natural products.6 It is worth mentioning that the log
Po/w calculator used in MOE is based on a linear atom type model which takes implicit hydrogen atoms into consideration. This model has been previously validated using 1827 molecules in the training set (r2 = 0.93, RMSE = 0.39), but the reliability of the predictions could be dependent on the chemical space of the initial dataset used in validating the linear model.46,47 It should however be noted that, inspite of this limitation, 67.61% of the compounds currently in the ConMedNP library had log
Po/w values < 5 units. The peaks of the HBA and HBD were respectively at 5 acceptors and 1 donor and both curves fell off rapidly to maximum numbers of 70 and 37 respectively (Fig. 1E and F). It was also noted that only 10.13% of the compounds in ConMedNP had HBA > 10 and only about 9.79% had HBD > 5. Further calculated descriptors include the number of heavy atoms (HA, for which the atomic number Z > 1), the logarithm of water solubility (log
Swat),48 the molar refractivity (MR),49 the total polar surface area (TPSA),50 and the total hydrophobic surface area (THSA). These distributions are also shown in Fig. 2 for the ConMedNP dataset, as well as for the standard subsets. These results clearly point out the fact that compounds within the “drug-like” subset have interesting polarities and hydrophobicities, which fall within the acceptable limits for favourable bioavailabilities. Additionally, the pairwise comparison displaying the mutual relationship between the MW versus the calculated log
Po/w, HBA, HBD and the NRB are specified in Fig. 3. These plots show that the areas with the highest population densities fall within the “Lipinski region of interest” (MW < 500, −2 < log
Po/w < 5, HBA < 10 and HBD < 5), and for which NRB < 5.
| Library name | Library size | Totaumers | MW (Da) | log P |
HBA | HBD | NRB |
|---|---|---|---|---|---|---|---|
a MW, mean of molar weight; log P, mean of logarithm of the calculated octan-1-ol–water partition coefficient; HBA, mean number of hydrogen bond acceptors; HBD, mean number of hydrogen bond donors; NRB, mean number of rotatable bonds.
|
|||||||
| ConMedNP | 3177 | 7838 | 426.70 | 4.18 | 5.85 | 2.39 | 5.31 |
| Drug-like | 1726 | 3900 | 326.16 | 2.87 | 4.97 | 1.79 | 2.96 |
| Lead-like | 738 | 1610 | 269.58 | 2.48 | 4.17 | 1.49 | 2.01 |
| Fragment-like | 155 | 355 | 192.12 | 1.74 | 3.31 | 1.08 | 1.14 |
| CamMedNP | 1859 | 5286 | 421.63 | 4.07 | 6.00 | 2.40 | 5.51 |
![]() | ||
Fig. 3 Pairwise comparison of mutual relationships between molecular descriptors: A = the distribution of the calculated log Po/wversus MW, B = HBA versus MW, C = HBD versus MW and D = NRB versus MW. | ||
Po/w), HBA, and HBD have been calculated and used to compare the “drug-likeness” of the ConMedNP library with 126
140 compounds from the Dictionary of Natural Products (DNP),51 which have been previously analysed, and retrieved from the literature.6 This comparison has been carried out, side by side, with the same properties for our previously published CamMedNP library33 and shown in Fig. 4. In these histograms, we show only data that falls within the “Lipinski region of interest” (MW < 500, −2 < log
Po/w < 5, HBA < 10, and HBD < 5), and the values are expressed as a percentage count of their respective datasets. In all cases the distributions of ConMedNP were enhanced for the Lipinski properties (peaks of the distributions moved to more “drug-like” properties) when compared to the DNP. This is particularly noticeable for 301 ≤ MW ≤ 500, 3 ≤ log
Po/w ≤ 5, 4 ≤ HBA ≤ 7 and 1 ≤ HBD ≤ 3. The MW distribution of ConMedNP peaks at 401–500 Da, while those of the other two datasets peak at 301–400 Da (Fig. 4A). Below the range 301 ≤ MW ≤ 500, the percentages were reduced for the ConMedNP when compared to the DNP. This same observation is true for log
Po/w < 3, HBA < 4 and HBD < 1. This improved profile for MW is exactly what is desirable for a more “drug-like library”, according to Lipinski's criteria. The proportions of CamMedNP and ConMedNP which satisfy Lipinski's MW property (<500 Da) were respectively 77.94% and 78.91%, compared to 73.04% for the DNP. The distribution maxima for calculated log
Po/w (Fig. 4B) were similar for our two datasets, appearing between log
Po/w values of 3–4 (for both CamMedNP and ConMedNP), while DNP gives a value between 2 and 3. A similar trend was observed for the MW distribution. This showed an enhancement of 13.89% for MW values between 301 and 500 Da of ConMedNP over the DNP, comparable with the enhancement of 11.19% of CamMedNP over the DNP. The enhancements of our two datasets over the DNP for log
Po/w values in the range 2 ≤ log
Po/w ≤ 5 were calculated to be 13.15% and 13.11 respectively for CamMedNP and ConMedNP. For HBA (Fig. 4C), ConMedNP showed an improvement of 11.86% over the DNP within the range 4 ≤ HBA ≤ 7, which was significantly lower than the corresponding enhancement of 18.65%, which the CamMedNP library had shown over the DNP.28 For the HBD, it was noticed that our two datasets (CamMedNP and ConMedNP) both showed enhancements over the DNP within the range 1 ≤ HBD ≤ 3, with respective percentages of 10.39% and 11.50%. The peak of the distribution for the HBA for both CamMedNP and ConMedNP is at 5 acceptors (respectively 18.45% and 16.30%), while that of the DNP is at 4 acceptors (14.15%). This gives us a significant increase in 6 or 7 acceptors for the CamMedNP library and a corresponding 4 or 5 acceptors for the ConMedNP library, when compared to the DNP (Fig. 4C). Similarly, the peak of the distribution for the HBD for the ConMedNP is at 1 acceptor (27.07%) with a significant increase in 5 or 6 donors as compared to the DNP (Fig. 4D). The overall summary of the four Lipinski parameters for the three datasets thus reveals that both CamMedNP and ConMedNP libraries are more “drug-like” than the DNP. This is an indication that the chances of finding “lead-like” molecules with improved DMPK properties within these libraries are quite significant. Data is currently available on the various plant sources from which the compounds in ConMedNP were derived, their ethnobotanical uses, geographical regions of collection of plant material, dates of collection, literature sources, known experimentally measured biological activities of compounds and sample availability. The ConMedNP library is constantly being updated; meanwhile a MySQL platform to facilitate the searching of the ConMedNP database and ordering of samples is being set up by our group. However, 3D structures of the compounds, as well as their physico-chemical properties that were used to evaluate “drug-likeness”, for the total library, as well as the “drug-like”, “lead-like” and “fragment-like” subsets can be freely downloaded as additional files accompanying this publication (respectively additional files 1–4).
651 compounds) from the ChemBridge Corporation.52 Histograms showing the calculated descriptors (MW, HBA, HBD, log
Po/w, NR, NRB, NN, NO, NRB and TPSA) are shown in Fig. 5 for ConMedNP (in light green) and the ChemBridge dataset (in red). The regions shown in dark green represent regions of intersection. The MW of the ConMedNP dataset stretches well beyond 1000 Da, while that of the ChemBridge dataset is restricted to the range 200 ≤ MW ≤ 500 Da. This observation could be explained by the complexity and large sizes of some of the compounds within the natural product library. The large proportion of very large and complex NPs in ConMedNP, could also explain the mean molar weight (=427 Da), when compared to those of the standard “drug-like”, “lead-like” libraries and typical drugs (=310 Da).53 This same explanation holds for the trend which is observed in the distributions of log
Po/w, HBD, NCC, NO, NRB, NR, TPSA and HBA for ConMedNP, when compared with the ChemBridge dataset. It was generally observed that the ConMedNP dataset covers another physicochemical space than the ChemBridge Diversity dataset. Principal component analysis (PCA) was as well used as a means of comparing the extent of diversity of the two datasets. This consists in reducing the dimensionality of the calculated descriptors by linearly transforming the data, by calculating a new and smaller set of descriptors, which are uncorrelated and normalised (mean = 0, variance = 1). The PCA scatter plot of the previously calculated physicochemical properties of the ConMedNP (light green) and ChemBridge Diverset database (red), shown in Fig. 6, is a visual representation of the molecules in the respective datasets, as described by the 3 selected principal components (PC1, PC2 and PC3). Each point shown corresponds to a molecule, the spread of the points representing the diversity of the respective datasets. The first three principal components (PCs) explain 83% (ConMedNP) and 64% (ChemBridge) of the variance of the individual datasets. The larger number of outliers in the case of the ConMedNP dataset (away for the centre and towards the sides of the cube) indicates a wider sampling of the chemical space compared to the ChemBridge Diverset collection.
![]() | ||
Fig. 5 A simple descriptor-based comparison of the ConMedNP database and the ChemBridge Diversity database. Comparison of typical physicochemical property distributions (MW, HBA, HBD, NCC, NO, NRB, log P, NR and TPSA) in the ConMedNP (green) and ChemBridge Diverset (red) database. All histograms and scatterplots were generated with the R software.72 | ||
| Library name | log B/Ba |
BIPcaco-2b (nm s−1) | S mol c (Å2) | S mol,hfob d (Å2) | V mol e (Å3) | log Swatf (S in mol L−1) |
log KHSAg |
|---|---|---|---|---|---|---|---|
| Total library | 88.53 | 37.28 | 90.36 | 91.32 | 91.03 | 72.46 | 81.01 |
| Drug-like | 99.35 | 43.99 | 99.35 | 99.82 | 98.99 | 90.39 | 99.35 |
| Lead-like | 99.72 | 53.70 | 99.72 | 100.00 | 99.72 | 99.31 | 99.59 |
| Fragment-like | 100.00 | 33.11 | 95.45 | 100.00 | 92.21 | 97.40 | 98.05 |
| Library name | MDCKh | Indcohi | Globj | ro3k | log HERGl |
log Kpm |
# metabn |
|---|---|---|---|---|---|---|---|
| a Logarithm of predicted blood/brain barrier partition coefficient (range for 95% of drugs: −3.0 to 1.0). b Predicted apparent Caco-2 cell membrane permeability in Boehringer–Ingelheim scale, in nm s−1 (range for 95% of drugs: < 5 low, > 500 high). c Total solvent-accessible molecular surface, in Å2 (probe radius 1.4 Å) (range for 95% of drugs: 300–1000 Å2). d Hydrophobic portion of the solvent-accessible molecular surface, in Å2 (probe radius 1.4 Å) (range for 95% of drugs: 0–750 Å2). e Total volume of molecule enclosed by solvent-accessible molecular surface, in Å3 (probe radius 1.4 Å) (range for 95% of drugs: 500–2000 Å3). f Logarithm of aqueous solubility (range for 95% of drugs: −6.0 to 0.5). g Logarithm of predicted binding constant to human serum albumin (range for 95% of drugs: −1.5 to 1.5). h Predicted apparent MDCK cell permeability in nm s−1 (< 25 poor, > 500 great). i Index of cohesion interaction in solids (0.0 to 0.05 for 95% of drugs). j Globularity descriptor (0.75 to 0.95 for 95% of drugs). k Percentage compliance to Jorgensen’s Rule of Three. l Predicted IC50 value for blockage of HERG K+ channels (concern < −5). m Predicted skin permeability (−8.0 to −1.0 for 95% of drugs). n Number of likely metabolic reactions (range for 95% of drugs: 1–8). | |||||||
| Total library | 47.14 | 94.92 | 89.88 | 43.57 | 58.35 | 92.09 | 81.52 |
| Drug-like | 59.61 | 99.14 | 97.70 | 73.52 | 63.62 | 95.93 | 91.89 |
| Lead-like | 59.86 | 100.00 | 97.81 | 93.56 | 76.03 | 97.53 | 96.44 |
| Fragment-like | 63.33 | 100.00 | 92.21 | 100.00 | 100.00 | 98.05 | 92.21 |
![]() | ||
| Fig. 7 MCSS panel in ConMedNP, featuring the most common cyclic structures included in the database. | ||
Po/w, log
S, HBA, HBD, THSA, TPSA, NO, NCC, NR and number of Lipinski violations were calculated using the molecular descriptor calculator included in the QuSAR module of the MOE package,73 while molecular descriptors related to drug metabolism and pharmacokinetic profiles were computed using the QikProp software54 running in normal mode. The ChemBridge Diverset dataset (48
651 compounds) was downloaded from the official ChemBridge webpage.52 The LibMCS program of JKlustor75 was used for maximum common substructure clustering of the ConMedNP database. In the MCSS search, only structures with MW ≤ 600 were included, since MCSS clustering is only feasible on small molecules. This means, only 2785 of the compounds of the ConMedNP were analyzed for MCSS. The compounds were fragmented using the RECAP algorithm.76 It is noteworthy that the provided 3D structures are those published in the literature, based on NMR and other spectroscopic data. In order to facilitate the virtual screening procedure for inexperienced modelers, the preliminary treatment of input ligand structures by assignment of biologically relevant protonation states at physiological pH (5 ≤ pH ≤ 9.5) and tautomer generation was also carried out. Group I metals in simple salts were disconnected, strong acids were deprotonated, strong bases protonated, while topological duplicates and explicit hydrogens were added. For extremely large NPs, only the largest molecular fragments were retained and a cutoff of up to 100 tautomers per molecular structure was taken.
P (octanol–water) model. Available as a source code in MOE, 1998, unpublished.Footnote |
| † Electronic supplementary information (ESI) available. See DOI: 10.1039/c3ra43754j |
| This journal is © The Royal Society of Chemistry 2014 |