Samantha
Stone
a,
David J.
Newman
b,
Steven L.
Colletti
c and
Derek S.
Tan
*ad
aChemical Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, 1275 York Avenue, New York 10021, USA. E-mail: tand@mskcc.org
bNIH Special Volunteer, Wayne, PA 19087, USA
cZymergen, Inc., 430 E 29th St, New York 10016, USA
dTri-Institutional Research Program, Memorial Sloan Kettering Cancer Center, 1275 York Avenue, New York 10021, USA
First published on 3rd August 2021
Covering: 1981 to 2019
Natural products continue to play a major role in drug discovery, with half of new chemical entities based structurally on a natural product. Herein, we report a cheminformatic analysis of the structural and physicochemical properties of natural product-based drugs in comparison to top-selling brand-name synthetic drugs, and a selection of chemical probes recently discovered from diversity-oriented synthesis libraries. In this analysis, natural product-based drugs covered a broad range of chemical space based on size, polarity, and three-dimensional structure. Natural product-based structures were also more prevalent in top-selling drugs of 2018 compared to 2006. Further, the drugs clustered well according to biosynthetic origins, but less so based on therapeutic classes. Macrocycles occupied distinctive and relatively underpopulated regions of chemical space, while chemical probes largely overlapped with synthetic drugs. This analysis highlights the continued opportunities to leverage natural products and their pharmacophores in modern drug discovery.
We have previously carried out cheminformatic analyses that compare the structural features of drugs that are based on natural product structures to other drugs of purely synthetic origins.2,19 These studies have shown that drugs based on natural products tend to exhibit lower hydrophobicity and greater stereochemical content compared to their purely synthetic counterparts. Moreover, they suggest that the structural features found in natural products may be incorporated into synthetic drugs to target pharmaceutically-relevant chemical space and to increase the structural diversity available for drug discovery. Indeed, while natural product-based drugs continue to represent approximately half of all new chemical entity drug approvals (Fig. 1a), an increasing proportion have been generated by de novo synthesis based on natural product pharmacophores, particularly in the last 10 years (Fig. 1b).
Fig. 1 (a) Fractions of small-molecule new chemical entity (NCE) drug approvals based structurally on natural products vs. purely synthetic in origin. Total number (n) of small-molecule NCEs listed below each time period. (b) Fractions of NCEs that were unmodified natural products, defined natural product botanicals, natural product derivatives, or synthetic but based structurally on a natural product; purely synthetic drugs not shown. Data binned in 5 year periods, except final 4 year period. Categories as defined by Newman and Cragg: N = unaltered natural product; NB = botanical drug (defined mixture); ND = natural product derivative; S = synthetic drug; S* = synthetic drug (natural product pharmacophore); /NM = mimic of natural product.1 |
Herein, we present an updated and expanded analysis of the structural and physicochemical properties of natural product-based drugs approved between 1981–2019,1 in comparison to the top 40 best-selling brand-name drugs in 2006 and 2018,20 and a collection of chemical probes discovered recently from diversity-oriented synthesis (DOS) libraries.21 Further, we have classified these drugs by biosynthetic origin, macrocyclic structure, and therapeutic class to explore further the relationships between chemical space, chemotype, and drug function. Notably, we found a striking increase in the proportion of top-selling drugs that are based on natural products in 2018 compared to 2006. Principal component analysis (PCA) further showed that these top-selling drugs also occupy a larger range of chemical space in 2018 compared to 2006, and that the complete set of natural product-based drugs accesses an even more vast area. Interestingly, most of the recently discovered DOS probes overlap with purely synthetic drugs in the PCA plot. Moreover, the natural product-based drugs tend to cluster according to biosynthetic origins, and less so based on therapeutic classes. Taken together, these results indicate that there are rich opportunities for further exploitation of natural product-based structures in drug and probe discovery.
Category | Compounds | MW | HBD | HBA | ALOGPs | LogD | Rot | tPSA | Fsp3 | RngAr | |
---|---|---|---|---|---|---|---|---|---|---|---|
Total | Unique | ||||||||||
a ALOGPs = calculated 1-octanol/water partition coefficient, Fsp3 = fraction sp3-hybridized carbons, HBA = hydrogen-bond acceptors, HBD = hydrogen-bond donors, LogD = calculated 1-octanol/water distribution coefficient (pH 7.4), MW = molecular weight, Rot = rotatable bonds, RngAr = aromatic rings, tPSA = topological polar surface area. b In 2018, 28 out of the top 40 small-molecule drug products contained at least one natural product-based component. c In 2006, 14 out of the top 40 small-molecule drug products contained at least one natural product-based component. d Parameter averages for all compounds excluding (525 total, 471 unique) vs. only (51 total, 50 unique) ribosomal peptides are as follows: MW 501 vs. 2,288, HBD 3.1 vs. 31.4, HBA 7.4 vs. 35.6, ALOGPs 2.21 vs. 0.05, LogD 0.00 vs. −18.65, Rot 8.2 vs. 64.6, tPSA 135 vs. 926, Fsp3 0.58 vs. 0.54, RngAr 1.2 vs. 3.8. | |||||||||||
Natural product drugs (N) | 88 | 77 | 611 | 5.9 | 10.1 | 1.96 | −1.40 | 11.0 | 196 | 0.71 | 0.7 |
Natural product-derived drugs (ND) | 379 | 344 | 757 | 7.0 | 11.5 | 1.82 | −3.00 | 16.2 | 250 | 0.59 | 1.4 |
Top 40 drugs in 2018: N, ND, S*, S*/NM (2018-N) | 37 | 34b | 473 | 2.4 | 6.0 | 2.11 | 1.78 | 7.8 | 111 | 0.50 | 1.9 |
Top 40 drugs in 2018: S, S/SM (2018-S) | 17 | 15 | 444 | 1.9 | 5.1 | 2.83 | 2.49 | 6.5 | 95 | 0.33 | 2.7 |
Top 40 drugs in 2018: all | 54 | 49 | 464 | 2.2 | 5.8 | 2.33 | 2.00 | 7.4 | 106 | 0.45 | 2.2 |
Top 40 drugs in 2006: N, ND, S*, S*/NM (2006-N) | 15 | 14c | 367 | 2.4 | 5.0 | 2.08 | 0.40 | 7.6 | 90 | 0.54 | 1.6 |
Top 40 drugs in 2006: S, S/SM (2006-S) | 30 | 27 | 355 | 1.1 | 3.9 | 3.15 | 2.37 | 5.4 | 61 | 0.33 | 2.3 |
Top 40 drugs in 2006: all | 45 | 41 | 359 | 1.5 | 4.3 | 2.78 | 1.70 | 6.1 | 70 | 0.40 | 2.0 |
Diversity-oriented synthesis probes (DOS) | 10 | 10 | 552 | 1.1 | 4.7 | 4.08 | 3.90 | 4.9 | 85 | 0.38 | 2.8 |
All compoundsd | 576 | 521 | 673 | 5.8 | 10.1 | 2.01 | −1.79 | 13.6 | 211 | 0.58 | 1.5 |
For comparison, we also analyzed the top 40 best-selling, brand-name, small-molecule drugs as illustrated previously by Njarðarson and coworkers.20 Biologics such as monoclonal antibodies and other proteins were excluded from this analysis. Our previous cheminformatic analyses have used the top 40 small-molecule drugs from 2006.19,22–26 To assess changes in this collection over time, we also evaluated the top 40 small-molecule drugs from 2018. Both of these collections included drugs based on natural products (2006-N, 2018-N: N and ND classes also included in the complete natural product-based drug datasets above were dereplicated appropriately; S* and S*/NM classes were unique to these datasets) as well as others from purely synthetic origins (2006-S, 2018-S: S and S/NM classes). The presence of several multicomponent combination therapies resulted in greater than 40 unique chemical structures for each group. Strikingly, the fraction of natural product-based drugs increased dramatically in the 2018 collection (34 out of 49 unique structures = 69%, 28 out of 40 drug products = 70%) compared to the 2006 collection (14 out of 41 unique structures = 34%, 14 out of 40 drug products = 35%) (ESI Table S1†). Notably, the majority of these natural product-based compounds were produced by de novo synthesis (S*, S*/NM) in both the 2018 (23 out of 34 unique structures) and 2006 (10 out of 14 unique structures) collections, demonstrating the utility of natural product-based pharmacophores in modern drug discovery.
Finally, to expand the analysis further, we included a collection of 10 chemical probes discovered from DOS libraries and reviewed recently by Gerry and Schreiber.21 This collection represents a wide range of biological activities and is comprised mainly of polycyclic structures that might intuitively be viewed as resembling natural products more so than purely synthetic drugs (ESI Fig. S1†).
These included the Lipinski ‘rule of 5’ parameters for oral bioavailability,27 molecular weight (MW), hydrogen-bond donors (HBD), hydrogen-bond acceptors (HBA), and 1-octanol/water partition coefficient (ALOGPs), the last calculated using the method reported by Tetko (the capital P designates a logP calculation).28,29 The related parameter 1-octanol/water distribution coefficient at pH 7.4 (LogD) was also calculated using Instant JChem (ChemAxon). As solubility is often a limiting factor for synthetic drugs, we also calculated aqueous solubility (ALOGpS), using the Tetko algorithm (the capital S designates a logS calculation).29,30 Of note, the Tetko algorithm was unable to process 14 very large ribosomal peptides; to enable inclusion of these structures in the analysis, we used the average ALOGPs and ALOGpS values from the next three largest peptides as placeholders. We also evaluated the corresponding Instant JChem calculations, but observed extremely large values (ChemAxon LogP < −15; LogS > 60) and deemed these unreliable. It should be noted that Instant JChem also returned very large LogD values (< −28) for these 14 peptides, which were included in the analysis but should be viewed with caution.
We further included Veber's ‘rule of 2’ parameters for oral bioavailability,31 rotatable bonds (rot), and topological polar surface area (tPSA). To normalize for molecular size, we also calculated the van der Waals surface area (VWSA) and relative polar surface area (rPSA).
Natural product drugs tend to have fewer nitrogen atoms (N) and more oxygen atoms (O) than synthetic drugs,32 and, thus, counts of these atoms were included as well.
The fraction of sp3-hybridized carbons (Fsp3) and number of stereocenters (Stereo) were included as indirect indicators of molecular complexity and three-dimensional structure. Both parameters have been correlated with increased binding selectivity.9 Stereocenters have also been identified as a distinguishing feature of natural products compared to synthetic drugs32 and correlated with improved progression through clinical trials.12 To normalize the latter for molecular size, we also calculated stereochemical density as the number of stereocenters divided by the number of heavy atoms (Stereo/HA). Previous analyses from our group have described stereochemical density as Stereo/MW,2,19,22 but the number of heavy atoms was readily calculated herein using Instant JChem and provides a more relevant divisor with respect to molecular structure.
Parameters describing ring count (Rings), largest ring size (RngLg), ring systems (RngSys), and rings per ring system (RRSys) were included because some of these features distinguish natural products from synthetic drugs.32 Aromatic rings (RngAr) were also included as these have been correlated with increased preclinical toxicity and attrition rates in drug candidate progression.11
Parameters were calculated using Instant JChem and Tetko's Virtual Computational Chemistry Laboratory (http://www.vcclab.org).28–30 Average values and standard deviations of each structural and physicochemical descriptor, grouped by drug category, were calculated (Table 1 and ESI Table S2,† and Fig. 2 and ESI Fig. S2†). These calculations were performed both including and excluding ribosomal peptides, as these compounds proved to be outliers due to their high molecular weight and polarity (ESI Table S3†).
Fig. 2 Bar graphs of selected structural and physicochemical properties of natural product drugs (N), natural product-derived drugs (ND), top 40 brand-name drugs from 2006 and 2018 from natural product (-N) and purely synthetic (-S) origins, and recently discovered chemical probes from diversity-oriented synthesis libraries (DOS). See ESI Fig. S2† for complete data. |
The natural product-based drugs (N, ND) had higher average values compared to purely synthetic drugs (2006-S, 2018-S) for multiple parameters, including molecular weight (MW), hydrogen-bonding groups (HBD, HBA), rotatable bonds (Rot), polar surface area (tPSA), total surface area (VWSA), oxygen counts (O), and stereocenters (Stereo). These trends were also observed when ribosomal peptides were excluded from the analyses, although the differences were somewhat smaller. In parameters normalized for molecular size, the natural products also had higher sp3 content (Fsp3) and stereochemical density (Stereo/HA) than the synthetic drugs, but more comparable relative polar surface area (rPSA). Conversely, the natural product-based drugs tended to have lower hydrophobicity (ALOGPs, LogD) and fewer aromatic rings (RngAr) compared to the synthetic drugs. In contrast, the natural product-based and synthetic drugs in this analysis had similar aqueous solubility (ALOGpS) and nitrogen counts (N). With respect to ring parameters, the natural product-based and synthetic drugs had similar ring counts (Rings), but the former tended to have larger rings (RngLg), and fewer but more complex ring systems (RngSys, RRSys).
Notably, the diversity-oriented synthesis probes (DOS) had the highest hydrophobicity (ALOGPs, LogD) and ring parameters (Rings, RngLg, RngSys, RngAr) and lowest solubility (ALOGpS) and relative polar surface area (rPSA) of any of the compound datasets. The DOS probes were similar to the natural product-based drugs for parameters associated with molecular size (MW, VWSA) and stereochemical complexity (Stereo, Stereo/HA), but more like the synthetic drugs for parameters relevant to polarity (HBD, HBA, tPSA), flexibility (Rot), and sp3 content (Fsp3).
Fig. 3 PCA and loading plots based on 20 structural and physicochemical parameters for natural product drugs (N), natural product-derived drugs (ND), top 40 brand name drugs in 2006 and 2018 derived from natural products (-N) or purely synthetic (-S), and recently discovered chemical probes from diversity-oriented synthesis libraries (DOS). See ESI† for complete data, including PC2 vs. PC3 plots (ESI Fig. S3†). |
Loading plots indicate that parameters associated with molecular size (HBA, tPSA, MW, HBD, VWSA; in order of magnitude) were highly correlated and had the strongest influence along the PC1 axis, reflected by the positioning of large drugs in the negative range (left) (Fig. 3, ESI Fig. S3, and ESI Table S5†). Polarity had a strong influence on PC2, with ALOGpS and rPSA correlating highly and providing the greatest contributions to variance in the positive direction (top), while ALOGPs conversely contributed to variance in the negative direction (bottom). In notable contrast, LogD contributed strongly to positive variance along PC1 but less so to negative variance along PC2, and this influence was seen in the positioning of several large but lipophilic compounds near the origin of PC1 (see below). This may be due to the high incidence of charged functionalities in large ribosomal peptide drugs, which impacts LogD but not ALOGPs (avg MW = 2,288, LogD = −18.65, ALOGPs = 0.05), resulting in an anticorrelation between size and LogD along PC1. However, this may also be an artifact of aberrantly large LogD values calculated by Instant JChem for large ribosomal peptides. Three-dimensional structure had a pronounced impact on PC3 positioning, with RngAr contributing to negative variance while Fsp3 and Stereo/HA being highly correlated and corresponding to positive variance.
Results of our PCA analysis indicate that natural product and natural product-derived drugs (blue) occupy a much greater range of chemical space compared to the purely synthetic drugs (red) in this analysis, consistent with previous studies.2,19,32 The natural product-based drugs extend deep into the negative range (left) of PC1 (Fig. 3), consistent with the larger size of these compounds observed in the parameter averages above (Table 1 and Fig. 2). The extreme outliers (−5 to −25) were all ribosomal peptides, such as the anticoagulant desirudin (Revasc; MW = 6963) and the diabetes drug lixisenatide (Lyxumia; MW = 4859), with carbohydrates such as heparan sulfate (Orgaran; representative tetramer MW = 1233) also extending deep into the negative range (to −8) (ESI Fig. S4†). Further analysis was facilitated by zooming in on a smaller region of the plot (Fig. 3 expansions).
The natural product-based drugs also extend further into the positive range (top) of PC2 compared to the synthetic drugs (Fig. 3), consistent with the presence of highly polar compounds such as the anticancer agent aminolevulinic acid (Levulan; ALOGPs = −2.85) and the hypolipidemic drug meglutol (Lipoglutaran; ALOGPs = −0.88) (ESI Fig. S4†). Finally, along PC3, the synthetic drugs clustered in the negative range, consistent with the high aromatic ring content of compounds such as the anticancer drug nilotinib (Tasigna; RngAr = 5), while the natural product-based drugs extended far into the positive range, based on their high three-dimensional character exemplified by compounds such as the cyclodextrin anesthesia antidote sugammadex (Bridion; Fsp3 = 0.89) and the halichondrin analogue anticancer drug eribulin (Halaven; Fsp3 = 0.88).
Among the top 40 brand-name drug sets (Fig. 3 and ESI Fig. S5†), segregation of natural product-based (-N) and purely synthetic (-S) molecules revealed PC3 as the primary axis of differentiation (blue vs. red), with the synthetic drugs all clustering in the negative range, again consistent with higher aromatic ring content (Table 1 and Fig. 2). The natural product-based drugs also spanned a wider range of positions along PC2 compared to the synthetic drugs, consistent with a wider range of polarities exemplified by the highly polar antidiabetic metformin (component of Janumet) (positive; ALOGPs = −1.83) and the highly hydrophobic antiviral pibrentasvir (component of Mavyret) (negative; ALOGPs = +5.97). However, the averages for the natural product and synthetic subsets remained comparable, particularly for the 2018 collection (squares). Interestingly, we noted a negative shift in the 2018 vs. 2006 averages along PC1 (squares vs. circles), indicative of a trend toward larger molecules within both the natural product-based and purely synthetic subsets, exemplified by molecules such as the natural product-derived anticancer drug everolimus (Afinitor/Certican; MW = 958) and the synthetic antiviral velpatasvir (component of Epclusa; MW = 883).
Interestingly, the DOS probes (green) clustered similarly to the synthetic drugs along PC1 and PC3, but were differentiated along PC2, where they extended deeper into the negative range (Fig. 3), consistent with the highly hydrophobic character observed in the parameter averages above (Table 1 and Fig. 2). Examples include the ENT1 (equilibrative nucleoside transporter 1) inhibitor rapadocin (ALOGPs = 5.87) and the Max transcription factor homodimer stabilizer KI-MS2-008 (ALOGPs = 4.71) (ESI Fig. S4†).
Thus, the PCA plots indicated that, as a class, natural product-based drugs exhibit higher structural diversity, larger molecular size, increased range of polarities, and more three-dimensional character compared to top-selling synthetic drugs. Recently identified probes from diversity-oriented synthesis collections shared many features with synthetic drugs, with the exception of increased hydrophobicity. These results contrast somewhat with the divergent characteristics of this small collection in the one-dimensional analyses of average parameter values above.
We observed that many of the biosynthetic classes segregated into distinct regions of the plots, consistent with a recent related analysis.36 Alkaloids (blue circles) were positioned in the positive range (right) along PC1, consistent with relatively small molecular sizes (avg MW = 402), spanned both the extreme positive and negative ranges of PC2, consistent with the range of polarities between metformin (component of Janumet; ALOGPs = −1.83) and dinalbuphine (Naldebain; ALOGPs = 5.34), and trended toward the negative range along PC3, consistent with the presence of at least one aromatic ring in almost all of these compounds (avg RngAr = 1.8) (ESI Fig. S7 and Table S6†).
In contrast, carbohydrates (yellow circles and triangles) extended across a wide span of PC1, ranging in size from oseltamivir (Tamiflu; MW = 312) to heparan sulfate (Orgaran; MW = 1233) (ESI Fig. S7†). These compounds clustered in the positive region (top) of PC2, and in the positive region of PC3, consistent with their high polarity (avg ALOGPs = 0.83) and rich stereochemical and three-dimensional features (avg Fsp3 = 0.73; avg Stereo/HA = 0.26) (ESI Table S6†).
Fatty acid derivatives (orange circles and triangles) clustered in the lower right of the PC1 vs. PC2 plot. As these compounds have moderate molecular weights (avg MW = 485), the positioning on PC1 may be impacted by the competing influence of their high LogD (3.05) (ESI Table S6†). Negative positioning along PC2 is consistent with their high hydrophobic character (avg ALOGPs = 4.59), with the antiobesity drug orlistat (Xenical; ALOGPs = 7.61) near the negative end of the range (ESI Fig. S7†).
Nucleosides (purple circles and triangles) clustered in the positive range (top) along PC2, indicating high polarity (avg ALOGPs = 0.19) but fell near the origin of PC1 and PC3, reflecting their relatively small molecular size (avg MW = 399) and a combination of aromatic character of the bases and three-dimensional character of the sugars (avg RngAr = 1.9; Fsp3 = 0.54) (ESI Table S6†). Exemplars include adenosine (Adenocard; MW = 267, ALOGPs = −1.21, RngAr = 2, Fsp3 = 0.5) and emtricitabine (component of Genvoya and Truvada; MW = 247, ALOGPs = −1.41, RngAr = 1, Fsp3 = 0.5) (ESI Fig. S7†).
The β-lactam subset of peptides (pink circles) also clustered similarly to the nucleosides, consistent with analogous structural influences (avg MW = 487, ALOGPs = 0.07, RngAr = 1.4, Fsp3 = 0.44), as in the cephalosporin antibiotic cefminox (Meicelin; MW = 518, ALOGPs = −1.26, RngAr = 1, Fsp3 = 0.56) (ESI Fig. S7 and Table S6†).
Peptides derived from both ribosomal (pink open triangles) and NRPS (pink closed triangles) pathways represent the largest structures in this analysis (avg MW = 2288, 1059, respectively), and this shifts them far into the negative range (left) of the PC1 axis, with ribosomal peptides such as desirudin (Revasc, MW = 6963) and lepirudin (Refludan, MW = 6979) occupying extreme outlier positions (ESI Fig. S7 and Table S6†). Both subsets occupied a wide range of positive and negative positions along PC2 and PC3, indicative of the competing characteristics of the peptide backbone and various side-chain functionalities (avg ALOGPs = 0.05, 2.68; RngAr = 3.8, 2.1; Fsp3 = 0.54, 0.62, respectively). Examples of this range of properties are seen in the polar, sp3-rich immunostimulant glycopin (Likopid/Licopid, ALOGPs = −2.23, Fsp3 = 0.74) and the hydrophobic, aromatic-rich antibiotic dalbavancin (Dalvance, ALOGPs = 3.58, RngAr = 7).
Polyketides (cyan circles and triangles) clustered near the origin along the PC1 axis, but in the negative range along PC2 and the positive range along PC3, indicating a combination of hydrophobic (avg ALOGPs = 3.12) and three-dimensional character (Fsp3 = 0.64) (ESI Table S6†). These characteristics are exemplified by sirolimus/rapamycin (Rapamune; ALOGPs = 4.85, Fsp3 = 0.75), whose high molecular weight (914) may be counteracted by its high LogD (7.45) in placing it near the origin of PC1 (ESI Fig. S7†). Interestingly, the subset of 14 aromatic polyketides (cyan open triangles) clustered closer to the origin along PC2 and PC3, indicative of relatively lower hydrophobicity (avg ALOGPs = 2.06) and higher aromaticity (RngAr = 2.1, Fsp3 = 0.46). This is seen in the positioning of the anticancer drug epirubicin (Farmorubicin; ALOGPs = 1.41, RngAr = 2, Fsp3 = 0.44).
Terpenoids (green circles and triangles) clustered tightly in the lower right quadrant of the PC1 vs. PC2 plot, and in the positive region along PC3, consistent with relatively low molecular weight, high hydrophobicity, and high three-dimensional character. Examples include nausea drug dronabinol/THC (Marinol; MW = 314, ALOGPs = 7.29, Fsp3 = 0.62) and the analgesic alloaromadendrene (component of Acheflan; MW = 204, ALOGPs = 3.70, Fsp3 = 0.87) (ESI Fig. S7†). Notably, this class overlapped substantially with the synthetic drugs in the PC1 vs. PC2 plot, but was well-differentiated along PC3, indicative of the differences in the sp3 and stereochemical vs. aromatic ring content between these groups.
The other category (peach markers) is comprised of drugs that do not fall into any of the other seven designations, such as amino acids, folates, porphyrins, and shikimates. Generally, these tended to be positioned in the upper right quadrant of the PC1 vs. PC2 plot, as well as in the negative range along PC3, indicating small sizes, high polarities, and low sp3 content. Examples include the multiple sclerosis drug dimethylfumarate (Tecfidera; MW = 144, ALOGPs = 0.45, Fsp3 = 0.33) and the analgesic acetominophen (component of Apadaz; MW = 151, ALOGPs = 0.51, Fsp3 = 0.12) (ESI Fig. S7†).
In contrast, the polyketide-derived macrolides clustered more tightly near the origin along PC1 and in the negative range along PC2 and the positive range along PC3, indicating trends toward comparatively smaller size (and concurrently higher LogD) and more sp3 content. Examples include the anticancer agent eribulin (Halaven; MW = 730, LogD = 0.20, ALOGPs = 1.26, RngAr = 0, Fsp3 = 0.88) and the antibiotic fidaxomicin (Dificid; MW = 1058; LogD = 7.10, ALOGPs = 5.59; RngAr = 1, Fsp3 = 0.63) (ESI Fig. S7†).
Interestingly, of the two DOS probes (green diamonds) with macrocyclic structures, rapadocin (36-membered ring) was positioned near the macrolides in the PC1 vs. PC2 plot (slightly below and to the left), consistent with its large size (MW = 1238) and high hydrophobicity (ALOGPs = 5.87), but in a distinct negative region along PC3, owing to its high aromatic ring content (RngAr = 4) (ESI Fig. S7†). Meanwhile, H3B-8800 (12-membered ring) fell on the other side of the macrolide cluster in the PC1 vs. PC2 plot (slightly above and toward the right end of the group), and near the origin along PC3 (at the top end of the group). This may be attributed to its relatively smaller size (MW = 556) and intermediate hydrophobicity (ALOGPs = 3.32) and sp3 content (Fsp3 = 0.58).
The anticancer (red open circles) and antimicrobial (cyan open squares) classes had the greatest numbers of compounds in the analysis (63 and 115 unique structures, respectively) and spanned the widest ranges of chemical space, consistent with the diversity of biological targets addressed by these drugs. For example, the antimicrobials included β-lactam inhibitors of penicillin binding protein and β-lactamase, glycopeptides that bind D-ala–D-ala motifs in cell wall biosynthesis intermediates, aminoglycosides that disrupt translational proofreading in the 30S ribosome subunit, and macrolides that block tRNA translocation in the 50S ribosome (ESI Fig. S8 and Table S9†). All of these classes occupied distinct regions of the PCA plots corresponding to their structures.
There were also cases of biological targets for which multiple structural solutions are seen in drugs. For example, topoisomerase II is targeted both by the anthracyclines, which are aromatic polyketides bearing aminoglycoside substituents, and etoposide phosphate (Etopophos), which is derived from a distinct lignan biosynthetic pathway (ESI Fig. S9 and Table S10†). Further, microtubule-stabilizing anticancer drugs include both the terpenoid-derived paclitaxel (Taxol) and docetaxel (Taxotere), and the polyketide-derived macrocycle ixabepilone (Ixempra), all of which bind the same site on tubulin. In these cases, distinct structures that shared the same target fell in overlapping or nearby regions of the PCA plots, and were not as dramatically separated as compounds having different targets. This suggests that there may be some correlation between individual biological targets and the structural properties of drugs that bind them, even if the specific structures are formally distinct.
The antiinflammatory category (orange triangles) exhibited the tightest clustering of all drug classes, falling primarily in the lower right quadrant of both the PC1 vs. PC2 plot and the PC1 vs. PC3 plot (Fig. 5). The majority of these compounds are structurally similar corticosteroids, with relatively low molecular size, high hydrophobicity, and high sp3vs. aromatic content (avg MW = 485, ALOGPs = 3.20, Fsp3 = 0.67, RngAr = 0.3) compared to the complete collection (avg MW = 673, ALOGPs = 2.01, Fsp3 = 0.58, RngAr = 1.5) (ESI Table S8†).
Antivirals (pink open diamonds) clustered near the origin along PC1, consistent with intermediate size (avg MW = 515 excluding enfuvirtide (Fuzeon) ribosomal peptide outlier), but spanned a wide range along PC2, indicating diverse polarities (ALOGPs −2.29 to +5.97). This class includes nucleosides, aminoglycosides, and peptidomimetics, accounting for this structural diversity (ESI Fig. S10 and Table S11†). Notably, the aminoglycosides clustered with the nucleosides in the PC1 vs. PC2 plot, but differentiated along PC3 based on sp3vs. aromatic character, with the former in the positive range (avg Fsp3 = 0.68, RngAr = 0) and the latter in the negative range (avg Fsp3 = 0.53, RngAr = 1.88).
Antiulcer drugs (blue inverted triangles) also fell at the positive (right) end of PC1, straddling the origin along PC2, and spanning a wide range along PC3 (Fig. 5 and ESI Table S12†). Most of these structures are polyisoprenoids and eicosanoids, having relatively low molecular weight, intermediate polarity, and moderate to high sp3 content (avg MW = 389, ALOGPs = 3.42, Fsp3 = 0.47, RngAr = 1.7) (ESI Table S8†). However, this class also includes several synthetic proton pump inhibitors that have high aromatic content (e.g., esomeprazole [Nexium], lansoprazole [Prevacid], pantoparazole [Protonix], rabeprazole [Aciphex]; RngAr = 3), explaining the diversity of positions along PC3.
Cardiovascular drugs (brown circles) were highly divergent, consistent with their wide range of targets, and included compounds falling in all four quadrants of both the PC1 vs. PC2 and PC1 vs. PC3 plots. Notably, this category included the heparin and hirudin families of anticoagulants, as well as atrial natriuretic peptide analogues, extending to the extreme negative (left) end of the PC1 axis owing to their large sizes (ESI Fig. S11 and Table S13†). In contrast, the statin family of polyketide-based drugs clustered in the positive range along PC1 based on their smaller structures. Notably, their positions along PC3 differentiated the natural products (lovastatin [Mevacor], simvastatin [Zocor], pravastatin [Mevalotin]) and their natural product-derived analogues (atorvastatin [Lipitor], rosuvastatin [Crestor]) based on sp3vs. aromatic content (avg Fsp3 = 0.75 vs. 0.34; RngAr = 0 vs. 3), demonstrating that drugs with the same molecular target may fall in somewhat distinct regions of chemical space.
CNS & PNS drugs (purple open triangles) were found exclusively in the positive range (right) of PC1, reflective of the typically small chemotypes associated with neurological targets, but spanned broad ranges along both PC2 and PC3 (Fig. 5). This broad category includes both CNS and PNS drugs with a variety of targets and mechanisms of action, consistent with this structural diversity.
Hormone therapies (green diamonds) segregated into three distinct clusters along PC1, corresponding to large peptides (e.g., growth hormone-releasing hormone and parathyroid hormone analogues), oligopeptides of 8–12 residues (e.g., gonadotropin-releasing hormone and somatostatin analogues), and smaller molecules (e.g., steroid contraceptives, eicosanoid abortifactents, and vitamin D analogues) (ESI Fig. S12†). The latter two clusters generally fell in the negative range along PC2, consistent with their hydrophobic character, but diverged along PC3, indicating contrasting levels of sp3vs. aromatic content.
Immunomodulatory drugs (inverted pink triangles) did not cluster tightly along any axis, indicative of their wide range of structures and targets. Even the subcategory of immunosuppressants included the large macrolide FKBP ligands tacrolimus (FK-506, Prograf), pimecrolimus (Elidel), sirolimus (rapamycin, Rapamune), and everolimus (Afinitor); the macrocyclic peptide cyclosporine (Sandimmune); the nucleoside mizoribine (Bredinin); the spermidine derivative gusperimus (Spanidin); and the terpenoids mycophenolate (Myfotic) and mycophenolate mofetil (CellCept) (ESI Fig. S13 and Table S14†).
Drugs used to treat metabolic disorders (lime circles) segregated into three main clusters, one in the extreme negative region (left) of PC1, another in the upper right quadrant of the PC1 vs. PC2 plot, and a third in the lower right quadrant. The first cluster included large antidiabetic peptides (e.g., glucagon and glucagon-like peptide mimetics), the second included a wide variety of small metabolite analogues (e.g., metformin, mlglastat, betaine), and the third was comprised mainly of synthetic antidiabetics (e.g., DPP4 inhibitors and PPAR-γ agonists) (see ESI Fig. S14 and Table S15†). Notably, the metabolite analogues fell in the extreme positive range along PC2, consistent with their highly polar structures, and generally in the positive range along PC3, indicative of their high sp3 content. In contrast, the synthetic compounds fell in the negative region of PC3, based on their high aromatic content.
Finally, respiratory drugs (red squares) segregated into two clusters in the upper and lower right quadrants of the PC1 vs. PC2 plot, indicative of relatively smaller, hydrophobic molecules (avg MW = 450, ALOGPs = 2.80) (ESI Table S8†). This class spanned a wide range along the PC3 axis, consistent with a variety of balances between sp3 and aromatic content (Fsp3 = 0.19 to 0.95; RngAr = 0 to 4). Examples used to treat asthma include the steroids budesonide (Pulmicort, Symbicort) and ciclesonide (Alvesco), the aromatic polyketide amlexanox (Solfa), the alkaloid salmeterol xinafoate (Advair), and the synthetic montelukast (Singulair) (ESI Fig. S15 and Table S16†).
Moreover, we noted a significant increase in the number of natural product-based drugs and drug components in 2018 compared to 2006. Structurally, this was manifested in the property bar graphs and PCA plots by a slight shift on average toward larger and beyond-rule-of-5 molecules. This may reflect the increasing interest in investigating a wider range of structures, and natural products in particular, in drug discovery. The increased prevalence of these molecules in top-selling brand name drugs suggests that such efforts have been successful and may continue in the future. However, it should be noted that these molecules still generally cluster with purely synthetic drugs and do not extend out into the extreme regions of chemical space accessed by other natural product-based drugs.
Visualization of the PCA analyses encoded by biosynthetic origin revealed that drugs tend to cluster accordingly, with large peptide and carbohydrate molecules accessing remote regions of chemical space that are otherwise inaccessible to the other biosynthetic classes, and to purely synthetic drugs. Cyclic peptides were distributed across this region of chemical space, while macrolides clustered more tightly with other polyketide drugs having intermediate molecular size. Notably, the DOS probe rapadocin43 is also macrocyclic and fell in this general region of chemical space, indicating its accessibility via synthesis. Other DOS probes with polycyclic structures overlapped with regions populated by alkaloid and terpenoid natural product-based drugs. We note that rapadocin is at the size limit of natural polyketides, but well outside the normal range of the top-selling drug sets. This supports the idea that novel synthetic pathways can provide access to broader regions of chemical space that may be of clinical utility.
Sorting the dataset according to therapeutic class revealed that drugs generally did not cluster strictly according to therapeutic class, but rather by structural class, as would be expected given that the PCA is based on structural parameters. Some subsets of drugs did cluster according to molecular targets, although examples of divergent chemical solutions to individual targets were also evident. Perhaps the only exception to these trends were in the CNS & PNS and respiratory drugs, representing 26 and 8 subcategories, respectively, as originally reported by Newman and Cragg.1 In these cases, the pharmacological requirements for distribution in these tissue compartments may explain this apparent clustering.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/d1np00039j |
This journal is © The Royal Society of Chemistry 2022 |