Guided by evolution: from biology oriented synthesis to pseudo natural products.

Covering: up to 2020Natural products (NPs) provide inspiration for the design of biologically active compounds and libraries. In this review, we cover several experimental and in silico approaches, which have been used to simplify NPs and guide NP-based library design. Earlier approaches, like the structural classification of natural products (SCONP) and biology-oriented synthesis (BIOS), focus on the identification of activity determining scaffolds and the synthesis of corresponding compound collections. More recently, NP fragments identified by means of cheminformatic analysis of the Dictionary of Natural Products (DNP) have been combined in unprecedented fashions to yield pseudo natural products (pseudo NPs), which show biological activities unrelated to the guiding NPs. Each approach was also the source of chemical innovation, in which synthetic methods were established for the rapid assembly of NP-inspired compounds and libraries.


1
Introduction 2 Structural classication of natural products (SCONP) 3 Natural product fragments 4 Biology-oriented synthesis 5 Pseudo natural products 5.1 Bridged recombination 5.2 Fused recombination 6 Conclusion and outlook 7 Conicts of interest 8

Introduction
Harnessing nature's pool of biologically active molecules has been a central topic of chemical biology and drug discovery programs. 1,2 Natural products (NPs) offer highly threedimensional scaffolds, which have been pre-validated by nature. 1 Therefore, NPs have served as versatile platforms for the discovery of novel biologically active compounds. Advances in synthetic methodology and total synthesis have enabled access to complex scaffolds and the chemoselective derivatisation thereof for structure-activity relationship (SAR) studies. [3][4][5][6][7] In addition, through elegant distortions of the NPs core-scaffold, unprecedented structural motives can be accessed. This approach was termed "Complexity-to-Diversity" (CtD) and enabled the discovery of novel biologically active molecules from terpenes, 8,9 steroids 8 and alkaloids. 8,[10][11][12][13][14] NPs have also been used to generate small molecule libraries for drug discovery. For example, researchers at Novartis have assembled a 150-membered fragment library from four NPs (tacrolimus, sanglifehrin A, cytochalasin E and massarigenin C) using degradation and diversication reactions previously employed in structure elucidation. 15 Associated with the synthetic challenges involved in NP total synthesis, methods and approaches have also focused on reducing structural complexity of NPs while keeping biological relevance. Such design approaches may arrive at NP-inspired compounds which still represent the kind of bioactivity encoded by the guiding NPs, or they may yield classes of NP-inspired compounds that display novel bioactivities.
Here we review two concepts for the design and synthesis of novel NP-inspired compound classes. In Biology Oriented Synthesis (BIOS), NP scaffolds are simplied in an iterative process arriving at series of structurally simplied, NP-inspired compound classes. Extension of the logic underlying BIOS and combining it with fragment-based compound development led to the concept of pseudo natural products (pseudo-NPs) in which NP fragments are combined in arrangements not available via biosynthesis. 16,17 2 Structural classification of natural products (SCONP) In general the bioactivity of NPs can be related to structural features. 18 In an attempt to identify structurally simplied compound classes that still resemble the underlying structures of NPs, and thereby their biological relevance and activity, a cheminformatic method for structural classication of natural products (SCONP) was developed.
To this end the CRC Press Dictionary of Natural Products (DNP, 190 939 compounds at the time) was employed as an initial dataset. 18 In silico fragmentation of the NPs and subsequent building of a scaffold tree based on substructure identi-cation enabled a visual representation of scaffold relationships (Fig. 1). In the scaffold tree, more complex scaffolds are arranged on the outside and become simpler towards the centre. 18 The SCONP concept was further validated using protein structure similarity clustering (PSSC) as an independent guiding principle (Scheme 1A) for the discovery of a selective 11b-hydroxysteroid dehydrogenase type 1 (11bHSD1) inhibitor. 18 PSSC builds structure similarity clusters of proteins with similar ligand-sensing domains in which cluster members are expected to have similar ligands. 11bHSDs were found in the same structure similarity cluster as the Cdc25A phosphatase and acetylcholinesterase. Through fragmentation according to SCONP, dysidiolide, a CdC25A inhibitor, could be related to glycyrrhetinic acid (GA), a naturally occurring unselective ligand for 11bHSD1 and 11bHSD2. Based on their common octahydronaphthalene scaffold 1, a library of 162 compounds was prepared of which 28 were conrmed as selective inhibitors of 11bHSD1 over 11bHSD2 (compounds 1a-1c, Scheme 1B).
To harness the power of the SCONP approach and facilitate the building of scaffold trees, the open source soware Scaffold Hunter was developed. 19,20 The soware uses complex data from e.g. biological screening sets as input and extracts chemically meaningful scaffolds sharing biological activity. From those it builds a scaffold tree linking related scaffolds but it also generates virtual scaffolds, where links between tested molecules are missing. The virtual scaffolds represent unexplored chemical space expected to share the bioactivity of the input and can serve as starting points for novel synthesis efforts. 19 Prospective exploration using Scaffold Hunter was demonstrated with data from a pyruvate kinase screen available on PubChem (Scheme 2A). 19 The program built a scaffold tree containing 35 868 scaffolds (for example fragments 2-4) from the 51 415 unique molecules used in the assay. Subsequently, novel virtual scaffolds (fragments 5-7) were predicted to share the biological activity of the tested molecules. A library of 107 compounds embodying these virtual scaffolds was obtained and their half-maximal inhibitory concentrations (IC 50 ) were determined. Nine novel pyruvate kinase inhibitors or activators with IC 50 values between 1 and 10 mM were identied (selected examples 9 and 10, Scheme 2B). This corresponds to a hit rate of nearly 10% which is substantially higher than the average hit rate in PubChem high-throughput screening campaigns (0.34%).
The SCONP tree provides a systematisation to aid compound library design. In combination with a second independent criterion, SCONP can provide readily accessible NP scaffolds retaining affinity to proteins and biological activity. Even though the generated libraries were comparably small, high hit rates were obtained. 19,21,22 It should be noted that with increasing deviation from the parent scaffold, e.g. to simpler scaffolds, biological similarity will decline.

Natural product fragments
While building the scaffold tree, parts of the scaffold are systematically removed to ultimately arrive at fragmentsized scaffolds. These fragments might offer valuable attachment points for efficient growth of the fragment. This was addressed in a subsequent analysis of the DNP (183 704 molecules, Scheme 3A). 23 The fragmentation was performed following a set of rules that builds on the SCONP analysis, yet retains side-chains, functional groups, hybridisation states and stereochemistry (Scheme 3B). All obtained fragments were kept (751 577 fragments) and further ltered through the application of an HTS lter to remove potentially unstable, toxic and undesirable fragments. The dataset was further tailored by removing macrocycles and using settings inspired by an adaptation of the original 'rule of three' (i.e. molecular weight < 300 Da, fewer than three hydrogen bond donors and acceptors, Clog P < 3) 24,25 to NP structure (i.e. molecular weight < 350 Da, at most 3 hydrogen bond donors, at most 6 hydrogen bond acceptors, at most 6 rotatable bonds and Alog P < 3.5) to afford a set of 110 485 fragments. Subsequent pharmacophore clustering generated 2000 clusters representing groups of related fragments. Comparing the molecular properties (e.g. O/N-count, Fsp3, number of rings) of this set to fragments obtained from the ZINC and SynLib libraries by principal component analysis (PCA) showed distinct differences in the covered chemical space. The NP-fragments in general showed a higher average Fsp3 content and O-count, with fewer aromatic rings present. The biological relevance of the obtained clusters was tested by searching for novel p38a MAP kinase inhibitors. Nine fragments out of 193 tested cluster centres were found to bind in various modes, as observed by X-ray crystallography. The screening revealed that sp 3 -rich fragments derived from cytisine and sparteine are novel allosteric inhibitors and stabilise the DFG motive in its inactive form (e.g. 11 and 12 in Scheme 3C). The optimisation of binding was guided by employing structurally similar clusters to increase the diversity of the library. This example highlights the potential of the generated clusters for development into bioactive molecules and to guide optimisation attempts. The fragment clusters retain many attributes of NPs and cover chemical space outside commercial libraries. Although the obtained fragments and cluster centres are structurally less complex than their guiding NPs, their synthesis will oen require considerable effort and the development of novel methods.
In silico fragment generation from NPs with annotated biological activity or from high-throughput screening data is a powerful way to explore novel chemical space. The obtained fragments are oen readily accessible, allowing for the creation of large libraries. It is important to note that fragments usually inherit the biological activity of the parent compound although they may vary in selectivity towards similar targets. 20 Alternatively, fragments and structural clusters can be used as guiding structures for the rapid assessment of biological activity and subsequent development of more potent compounds.

Biology-oriented synthesis
The aim of biology-oriented synthesis (BIOS) is the structural simplication of NP scaffolds to obtain new compound classes, which cover biologically relevant space. 1,26 Subsequently, a synthesis route is developed to give a NP-inspired compound collection for biological studies (Scheme 4). Over the past two decades, several NP scaffolds have been simplied through BIOS and have led to the discovery of novel biologically active compounds. The examples presented here were chosen due to related synthetic strategies relying on cycloaddition reactions. The 1,3-dipolar cycloaddition is a powerful method to introduce pyrrolidine moieties into scaffolds. [27][28][29] The development of regio-, diastereo-and enantioselective 1,3-dipolar cycloadditions makes it a versatile tool for synthesis. 27 Ningalin B and lamellarin D are two classes of marine natural products exhibiting interesting biological activities including antitumor activity, HIV-1 integrase inhibition and multidrug resistance reversal activity. 30 The synthesis of the common pyrrolocoumarine core-scaffold was achieved through intramolecular 1,3dipolar cycloadditions from alkynes 13 (Scheme 5). 31 Cell-based screening of a collection of 43 compounds revealed Pyrcoumin as an inhibitor of canonical Wnt signaling. 32 Further target identication and validation showed that it targets the nucleotide pyrophosphatase dCTPP1. 33 Pyrrolizidines are characteristic for a large group of alkaloids which exhibit insecticidal activity. 34 A dual cyclization strategy was developed to access a pyrrolizidine-like compound collection (Scheme 6). Enantioselective Cu-catalysed 1,3-dipolar cycloadditions between azomethine ylides (14) and N-methylmaleimides (15) delivered bicyclic pyrrolidines 16 with excellent diastereoselectivity and enantioselectivity (Scheme 6). 35 Subsequent intramolecular lactamisation provided the tricyclic pyrrolizidine-alkaloid analogues 17 in high yields. Of the 119 compounds, four compounds inhibited parasite growth. 1,3-Dipolar cycloadditions have also been used to access bridged structures with high stereocomplexity. In a recent example, cyclic azomethine ylides were generated from a-diazo ketones 18 by means of rhodium catalysis. The cycloaddition between the transient cyclic azomethine ylides and the oxindole-derived alkenes 19 delivered spirotropanyl oxindole scaffolds 20 with good to excellent diastereo-and enantioselectivity (Scheme 7). 36 Utilizing bimetallic relay catalysis, 24 compounds were synthesised enabling a BIOS of spirotropanyl oxindole alkaloids similar to alstonisine and chitosenine.
Apart from the synthesis of NP scaffolds by novel methodologies, established multistep syntheses of NP fragments have been used to access simplied scaffolds of NPs, resembling a function-oriented synthesis (FOS) 37,38 approach but with library synthesis in mind. Withanolides are a family of at least 300 steroids exhibiting diverse bioactivities, including potent anti-inammatory effects or modulation of the mTOR and Wnt pathways. 39 Comparison of the structural features of withanolides showed that many retained the trans-hydrindane unit (C/D rings) linked to an a,b-unsaturated lactone (Scheme 8).
The A/B rings of the steroid scaffold on the other hand showed variability. This observation led to the assumption that the retained structural features of the C/D ring may be responsible for common biological activity along with the modication of the dehydro-d-lactone moiety. 40 A compound collection was assembled based on these observations using a multi-step sequence and late-stage derivatization. 40 Prins reaction of olen 21 with paraformaldehyde afforded alcohol 22 as a single stereoisomer. Diastereoselective hydrogenation of the double bond introduced another stereocenter. Swern oxidation followed by a diastereoselective Brown allylation gave the homoallylic alcohol 23. Aer removing the acetal protecting group, an appendage was incorporated into the scaffold through an esterication reaction. Ring closing metathesis furnished the dehydro-d-lactone moiety. This core scaffold (24)  BIOS provides a strategy to access biologically active compounds based on natural products through truncations and appendage modications. This has also been exploited to assemble libraries t for drug discovery. 42,43 Natural products were truncated to their core scaffold while keeping functionalities intact, which could be used for derivatisation and growing of the scaffold. 42 The obtained libraries were used in a rescaffolding approach for the discovery of novel inhibitors. Interactions of known inhibitors were analysed based on crystal structures and provided the basis for scaffold selection. In the example of the caspase-1 inhibitor PGE-3935199, it was found important that keeping the biaryl-and lactone side chains intact is crucial (Scheme 9A). 42 Truncation between C10 and C11 as well as C20 and N23 enabled screening for a replacement of the 5,8-fused bicyclic core of PGE-3935199. A screening through the assembled virtual library identied 16 NP scaffolds tting the descriptors. Out of these the bridged 2-azabicyclo[2.2.2] octane (2-ABO), a core scaffold of several marine NPs, offered a compelling solution to the problem as docking studies predicted an inhibitor in the nM affinity range. Synthesis and biological testing of CD10847 conrmed these predictions and an IC 50 17 nM was obtained in cell-based assays.
The same library was also employed in the discovery of Cyclophilin D (CypD) inhibitors through high throughput screening combined with X-ray crystallography. 43 Tetrahydroquinoline derivative 27 was identied to bind into the S2 pocket (Scheme 9B). Rescaffolding of a known inhibitor led to compound 28 with signicant increase in potency and affinity to the protein. In the same campaign succinimide 29 was found to bind into the S1 0 pocket of the target protein. The two scaffolds were subsequently combined (30) with a tting linker but no further improvement in potency was observed.

Pseudo natural products
With an emphasis on the synthesis of specic NP scaffolds, BIOS only explores the chemical and biological space covered by the guiding NPs. This biased approach limits the exploration of the wider chemical-and biological space, i.e. new chemical entities and mechanisms of actions (MoAs). By analogy, a retrospective analysis on the discovery of NPs shows that the NP space is largely limited and the number of fundamentally novel scaffolds discovered from nature is decreasing. 44,45 To overcome such limitations, biosynthesis may combine two unrelated synthetic routes to generate hybrid NPs, for example hybrid PKS-NRPS, 46 terpenoid alkaloids 47 and meroterpenoids. 48 However, the biosynthetic efficiency is limited by the accessible precursors among microorganisms and plants and the available biosynthetic machinery.
Thus, the chemical combination of two biosynthetically unrelated NP fragments would yield novel compound classes still endowed with biological relevance and would cover novel chemical and potentially biological space (Scheme 10). 16,49 Such hybrid compounds may be regarded as pseudo natural products (pseudo NPs). 16 Similar to diversity oriented synthesis (DOS) 50 libraries, the fragment based design principle can lead to compound collections with high chemical diversity. However, pseudo NPs are biologically prevalidated by the choice of fragments directly derived from natural products, while DOS libraries mostly are built with chemical and structural considerations as predominating arguments. 16 Compounds synthesised according to DOS can be considered NP-like as they encompass many attributes of NPs (e.g. high Fsp3 content) but their biological relevance is initially unclear. 16 Pseudo NP library synthesis also does not necessarily follow the build/ couple/pair strategy, 51 which was very successfully applied in DOS. Inspiration for suitable fragments to combine was found in the fragment clusters from the cheminformatic analysis of the DNP. 23 The identied 2000 cluster centres (see above), most of which are mono-or bicyclic scaffolds, offer a large variety of basic fragments for recombination. To increase complexity, NPs or their derivatives can serve as suitable platforms for combination with simpler fragments. Different connection types and positions lead to topologically diverse structures that may also differ in biological activities. According to the number of shared atoms between the fragments, recombined structures can be divided into four main groups, including monopodal connections (0 atom), spirocycles (1 atom), fused rings (2 atoms) and bridged scaffolds ($3 atoms). Determination of the NP-likeness score 52 of different fragment combinations showed that pseudo NPs are distinctly different from NPs and BIOS compounds but overlap with the space covered by drugs. 16,49,53,54 Differing from BIOS, where the guiding NPs can suggest possible biological targets, pseudo NPs are expected to exhibit different biological activity than their individual fragments. To biologically characterize these compound collections, target-agnostic screening methods can be applied. [55][56][57][58] The "Cell Painting" assay has shown great potential in characterizing pseudo NPs. Through staining of different cellular compartments and structures, phenotypic changes induced upon compound treatment are observed by microscopy. The obtained morphological proles (ngerprints) can be compared to reference compounds with known biological activity to arrive at a target hypothesis, which has to be further conrmed in more specic assays. The outlined strategy has led to the discovery of several novel biologically active compound classes with different molecular topologies, which will be discussed below. The focus is set on the employed chemistry as well as the biological characterisation, and is categorised by fragment recombination patterns.

Bridged recombination
Chromopynones (31) combine biosynthetically unrelated chromane and tetrahydropyrimidinone (THPM) fragments and were synthesized by means of a Biginelli reaction followed by an intramolecular cyclization (Scheme 11A). 49 The obtained library was investigated in several cell-based assays and an inhibition of glucose uptake in HCT116 cells was observed (Scheme 11B). Further biological investigations revealed that chromopynones are selective inhibitors of the glucose transporters GLUT-1 and -3, and inhibit cancer cell growth. 59 Furthermore, neither chromane-nor THPM fragments exhibit any activity towards glucose transporters; therefore the combination of chromane and THPM is essential for the biological activity. The NP-likeness scores of the synthesised chromopynones are comparable with synthetic drugs but distinctly differ from the scores calculated for NPs and BIOS collections (Scheme 11C).
Tropane and indole are two common fragments found in NPs. The recombination of these two fragments afford highly sp 3 -rich indotropane pseudo NPs (32, Scheme 12). 60 Their enantioselective synthesis was achieved via 1,3-dipolar cycloaddition of imine 33 and nitroalkenes 34. This scaffold can also be seen as a bridged recombination of tryptoline and pyrrolidine. Phenotypic screening of the obtained collection resulted in the discovery of myokinasib, which induces striking changes in mouse L broblast morphology indicating an involvement in cytokinesis. 53 Kinase screening and target validation revealed that myokinasib is a mixed-type myosin light chain kinase 1 (MLCK1) inhibitor. This chemotype was unprecedented for kinase inhibitors.

Fused recombination
The 1,3-dipolar cycloaddition also proved to be applicable for the synthesis of a variety of pseudo NPs with fused fragment combination (Scheme 13). A combination of pyrrolidine-and quinolinone fragments was obtained from amide 35 through an enantioselective intramolecular 1,3-dipolar cycloaddition to afford pyrroloquinolones (36, Scheme 13A). 61 The high selectivity of the method could furthermore be used for the kinetic resolution of racemic tropanes 37 to afford a rich collection of enantiopure pyrrotropanes (38, Scheme 13B). 62 Finally, pyrriridoids (39) were assembled from pyrone fragments 40 through 1,3-dipolar cycloadditions affording highly stereogenic compounds with ve contiguous stereocenters (Scheme 13C). 63 Several members of the obtained libraries showed promising activities in Wnt-and Hedgehog reporter-gene assays.
While the aim of the pseudo NP strategy is to combine fragments in a manner not yet observed in nature, it is possible that certain scaffolds may exist in nature but may not have been isolated from natural sources yet. In 2015, pyrroloquinolinones were synthesized as one of the rst examples of pseudo NPs by combining pyrrolidine and quinolinone fragments (36,Scheme 14). 61 More recently, this scaffold has been isolated from Streptomyces albogriseolus MGR072. 64 While pyrroloquinolinones 36 cannot be considered pseudo NPs anymore, they would still t the BIOS principle. More examples like this will emerge with the discovery of novel NPs. Such discoveries highlight the strength of the pseudo NP principle in providing biologically active compounds.
Indole is frequently present in naturally occurring bioactive molecules. Owing to the practicability of indole synthesis methodologies, 65 indole fragments can be readily combined with other fragments to afford large indole-based pseudo NP libraries. For example, sp 3 -rich morphan fragments 41 were subjected to Fischer-indole synthesis to efficiently gain access to an indomorphan collection (42,Scheme 15). 66 Biological screening of this 63-membered library resulted in the discovery of glupin, another potent GLUT-1 and GLUT-3 inhibitor. In further studies, glupin was systematically truncated to validate the necessity of the recombination of these two fragments.
The pseudo NP principle can be extended to more than two fragments. For instance, pyridone-, dihydrofuran-and pyran fragments were combined to form pyrano-furo-pyridones (43,PFP). Fusion of pyridines 44 and functionalized pyrans 45 through Tsuji-Trost reactions or Michael additions afforded the desired tricyclic cores as different regioisomers (43a-d, Scheme 16). 54 Biological analysis of the PFPs by means of the cell painting assay showed high biosimilarities with aumitin, a known inhibitor of mitochondrial respiration. 67 Further investigations validated this hypothesis and proved that PFPs were a novel chemotype for the inhibition of mitochondrial respiration by targeting mitochondrial complex I.

Conclusion and outlook
We have reviewed complementary principles for the design of biologically active NP-inspired compound classes and the evolution of the underlying logic. In silico fragmentation of NPs has been highlighted as a guiding tool for compound design. Combined with independent guiding principles, structural fragments with biological relevance can be extracted from complex NPs. 18 These fragments are typically readily accessible and libraries with high biological relevance were obtained. Through the hierarchical sorting of fragments, new and unexplored scaffolds were identied and were used for library design. 19 Biology-oriented synthesis uses a similar approach but reduces NP scaffolds to readily accessible yet highly stereogenic scaffolds. 1 In BIOS, biological space covered is restricted to NP structure and activity, i.e. biological activities of the novel entities may be similar to the guiding NPs. This challenge could be addressed by the pseudo NP concept. In a cheminformatic analysis of the DNP, 2000 NP fragment clusters were identi-ed. 23 Members of these were combined in a fashion not observed in nature, creating scaffolds with unprecedented biological activity.
Since there is a vast number of possible fragment combinations, connection types and regioisomers, it is anticipated that the concept of pseudo NPs will inspire the design and synthesis of numerous compound classes with novel biological activity. A systematic analysis of these molecules is therefore highly desirable. The pseudo NP concept benets from unbiased phenotypic screening techniques, like cell painting. They may provide target hypotheses and mode of action for novel scaffolds unprecedented in nature. The generated target hypotheses could be further conrmed in specic cell-based assays.

Conflicts of interest
There are no conicts to declare.