Jason G.
Kettle
*,
Richard A.
Ward
and
Ed
Griffen
AstraZeneca, Cancer and Infection Discovery, Mereside, Alderley Park, Macclesfield, SK10 4TG, United Kingdom. E-mail: jason.kettle@astrazeneca.com; Fax: +44(0)1625519749; Tel: +44(0)1625517920
First published on 13th October 2010
The properties of any molecule are fixed at the point of design. Since reliance on commercially available reagents sets to probe SAR relationships may result in incomplete interrogation of property space, efforts to pursue novel proprietary reagent collections are of clear benefit. One such approach based on fragmentation and analysis of molecules described in patent and medicinal chemistry literature is described, highlighting an example of key secondary amines with potential for broad applicability across medicinal chemistry.
The properties and therefore quality of any molecule is fixed at the point of design. We were motivated therefore to look earlier, at the design process itself, and the feedstocks into this process, namely the chemical reagents used to synthesise molecules themselves. This was in part driven by frustration – for many of the commonly used reagent classes, utilised both in collection enhancement initiatives and traditional medicinal chemistry lead optimisation programmes, the diversity found from commercial sources can be limiting. Equally, in addition to limited structural diversity, the physicochemical property space afforded from commercial reagent sources can also be narrow. For certain reactive reagent classes such as sulfonyl halides for example, commercial sets focus on relatively stable and readily synthesisable lipophilic aliphatic or aromatic moieties, with more polar or basic groups, desirable from a medicinal chemistry perspective often poorly exemplified. We recognised that rapid access to non-commercial sets of medicinally relevant reagents may offer a significant competitive advantage. Commercial reagent sets are available to all competitors, are generally costly for truly innovative reagents and can be subject to limited availability and long lead times incompatible with project timelines. Consequently a reliance on such sets may lead to incomplete structure–activity (SAR) and structure–property (SPR) assessment unless significant time and effort is invested by project teams to synthesise reagents that address this.
Initiatives that aim to address the quality of reagent collections used in medicinal chemistry design would be expected to impact directly the quality of screening and project compounds. Access to a broad, diverse and novel reagent collection could deliver a competitive advantage through structural novelty, allowing for more thorough exemplification in intellectual property applications, and could give access to substructures known (either through literature or in-house experience) to help address key medicinal chemistry issues such as hERG activity, physical properties and DMPK liabilities. Such reagents could provide ready made isosteric replacements for activity-enhancing but often problematic groups such as phenols. Ultimately, a high quality reagent collection could result in shorter make-test cycles where time is not spent deriving novel reagents to probe project-specific SAR and SPR, allowing for interrogation of property space not readily available to competitors.
One of a number of approaches that AstraZeneca has taken to enhance it's global reagent collection has been to mine the information held within the structures of molecules that are disclosed in small-molecule patent applications and publications detailing biological activity. The aim was to fragment all bioactive molecules of interest and search for embedded reagents that it may be useful to procure. After property and novelty assessment and analysis of frequency of occurrence, this would lead to reagents that competitors are using to tune the properties of biologically relevant small molecules, have a synthetic route disclosed but cannot be purchased. This approach should yield reagents, which although exemplified as modulators of one particular target or targets, might have applicability and utility against others. Groups that have been found to impart beneficial non-efficacy properties, be it in terms of novelty, lower lipophilicity or metabolism burden for example, might reasonably be expected to do so independently of any given target. In particular, reagents that appear across multiple patent applications, and in particular, being exploited by several companies could be anticipated to be of even greater interest, the concept of ‘privileged fragments’7 extended to reagent sets. Herein we disclose one such example utilising a cyclic secondary amine query, since analysis of internal data indicates this to be the most used class of reagents in AstraZeneca drug discovery and collection enhancement initiatives, and also the query likely to generate the largest output.
The database that was utilised for this search was a composite of externally licensed and internally curated patent and target information,8 and whilst not comprehensive, covers a major part of the bioactive compound output and target landscape from 30 years of global drug research and development, encompassing over 2 million unique compounds and over 42,000 patents and 66,000 publications. A major component of this is a focus on target class coverage - kinases, phosphatases, proteases, GPCRs, nuclear hormone receptors, transporters, phosphodiesterases, ion-channels, lipases, and transferases are all exemplified. The approach should however be applicable to any medicinally relevant database of compounds.
![]() | ||
Fig. 1 (a) SMIRKS definitions used to code the key transformations. (b) Schematic representation of the SMIRKS: (r2) designation indicates atom is contained within a ring attached to two other ring atoms. |
In the SMIRKS definitions, the embedded secondary amine is described by SMARTS and the transformation products are expressed by SMILES strings. We utilised three separate SMIRKS definitions to target molecules which were substituted on the nitrogen by carbon (non-aromatic), carbon (aromatic) and COMPOUND LINKS
Read more about this on ChemSpider
Download mol file of compoundsulfur. This allows us to release the embedded secondary amine from N-alkyl, N-acyl, N-aryl and N-sulfur linked compounds. The SMIRKS definitions only process secondary amines where the nitrogen and two adjacent carbons (mapped as 1, 2 and 3 above) are located in a ring. No constraints were placed on ring size, although only two ring bonds were allowed for atoms 1, 2 and 3, and both carbons were fixed with sp3 centres.
A SMIRKS processor12 was compiled in-house to take the full database and process all compounds using the rules above. As shown in Scheme 1 this reads in a SMILES file from the database, removes the salts and then applies each of the SMIRKS in turn through an enumeration process. For compounds which contained more than one of the sub-structures in the SMIRKS, all possible permutations of transformation were generated from the initial input molecule. Where the SMIRKS were applicable, the resulting product contains the core and R-groups, with the ‘[*:4][Xe]’ on the right-hand side of the SMIRKS definition used as a tag. Any part of the product containing a Xe atom is then filtered out. This has the effect of removing the substituent R (mapped as atom 4) to leave only the reagent candidate released as a secondary amine, where the substituent is replaced by hydrogen in the procedure encoded by the SMIRKS. The SMIRKS processor was used only to identify the first generation products, meaning only the molecules from the database were processed with these SMIRKS, none of the generated products were further processed. The set of reagent candidate released secondary amines were then prioritised in the subsequent analyses.
![]() | ||
Scheme 1 Schematic representing the fragmenting/enumerating process whereby the side-chains are removed from the amine cores. |
Scheme 2 illustrates the filtering procedures that were used to focus our search for secondary amines of interest. Application of the stripping/fragmentation steps on the complete list of 2,124,189 entries in the database led to 259,120 unique cyclic secondary amines which were grouped by frequency. In order to consider those that had significant exploitation in medicinal chemistry literature we kept only those with a frequency ≥ 5 (step 2). This reduced the number of amines dramatically to 16,232, in the process removing 199,200 amines with single occurrence. An exact match to the ACD database13 of commercially available reagents using in-house program FLUSH14 was then used to find and remove any amine that could be sourced commercially (step 3). Occurrence within COMPOUND LINKS
Read more about this on ChemSpider
Download mol file of compoundACD is a crude filter of commercial availability since not everything present can necessarily be sourced, and reagents without a presence may be obtained via non-ACD suppliers. Nevertheless, it is a quick and relatively simple filter to apply to large datasets such as this, and applying this matching process highlighted 2,734 reagents that were present in COMPOUND LINKS
Read more about this on ChemSpider
Download mol file of compoundACD.
![]() | ||
Scheme 2 Schematic representing the filtering process to derive novel cyclic secondary amines disclosed in the medicinal and patent literature. |
Table 1 shows the top 10 commercially available amines present in the search database along with frequency and as such this list presents few surprises. COMPOUND LINKS
Read more about this on ChemSpider
Download mol file of compoundMorpholine (entry COMPOUND LINKS
Read more about this on ChemSpider
Download mol file of compound1) is the most utilised amine in this data set with over 46,000 entries, presenting an often ideal balance of lipophilicity and basicity. As expected, the simplest of the cyclic secondary amine homologues such as COMPOUND LINKS
Read more about this on ChemSpider
Download mol file of compoundpyrrolidine, COMPOUND LINKS
Read more about this on ChemSpider
Download mol file of compoundpiperidine and COMPOUND LINKS
Read more about this on ChemSpider
Download mol file of compoundpiperazine occur with high frequency. Indeed, piperazine as a substructure is represented in 5 of the top 10 most used commercially available amines (entries 4–7 and 9), ranging from unsubstituted to simple aryl substituted examples common to many GPCR ligands (entries 6 and 7). Sub-structural searching of the source data set indicates that piperazine as a motif occurs singly in 115,744 bioactive molecules, and twice in 2,212 molecules, making this, by some margin, the most commonly exploited amine. For this search no substituents were allowed on the piperazine ring carbons and as with the SMIRKS definitions only N-alkyl, N-acyl, N-aryl and N-sulfur substituents were included. Such frequency analysis may serve to focus efforts aimed at deriving novel proprietary amines for synthesis.
Entry | Amine | Frequency | Entry | Amine | Frequency |
---|---|---|---|---|---|
COMPOUND LINKS Read more about this on ChemSpider Download mol file of compound1 |
![]() |
46,127 | 6 |
![]() |
3,582 |
2 |
![]() |
29,268 | 7 |
![]() |
3,231 |
3 |
![]() |
25,881 | 8 |
![]() |
2,957 |
4 |
![]() |
20,293 | 9 |
![]() |
2,780 |
5 |
![]() |
10,080 | 10 |
![]() |
2,423 |
After removing the commercially available amines, and in order to narrow the search further, we then applied a variety of calculated property filters. Firstly we limited our search to only those amines with a molecular weight ≤ 200 Da (Scheme 1, step 4). This figure is clearly subjective, but was based on an analysis of in-house data that suggested the frequency of use of any reagent in library synthesis was negligible above this cut-off. In the event this filter removed the bulk of the remaining amines, reducing the list by 12,142 to 1,356. If the focus of this work had been analysis of fragments for fragment-based lead generation activities then a higher molecular weight cut-off may have been adopted. A ClogP ≤ 2 filter was also applied (step 5). The application of ClogP to this set of reagents is at first sight not especially relevant since the overall contribution to lipophilicity from a given reagent is entirely dependent on both the scaffold it is attached to, and, for amines, the nature of that attachment. Nevertheless this was a generous guide, removing only 133 of the most lipophilic amines. In medicinal chemistry design, control of the number of hydrogen-bond donors (HBD) is often of critical importance, particularly with regard to cell permeability, efflux and absorption processes.15 Retaining only those amines that had HBD ≤ 3 (step 6) removed a further 91 examples to give 1,132 secondary amines (since one donor would be lost on coupling to a given scaffold, allowing for the incorporation of a further 2 donors is also considered generous). In a final step 7, we removed certain substructures deemed to be less desirable when present in amines for organic synthesis, and these included acyl hydroxamates (51, derived from metalloproteinase inhibitors in the source data), esters/acids (117) and ketones (35). Three amines found to contain COMPOUND LINKS
Read more about this on ChemSpider
Download mol file of compoundboron atoms were also removed to give a final count of 930 novel secondary amines. Table 2 shows the top 10 amines from this set, by frequency, together with information on number of WO patents that disclose their use, the number of unique applicants of these and the number of associated distinct drug targets.16
Entry | Amine | Frequency | Patentsa | Applicantsb | Targetsc |
---|---|---|---|---|---|
a PCT International Applications only. b Unique PCT applicants. c Unique primary targets where disclosed. In this analysis, distinct kinases are considered as a single primary target for example, but kinase activators a different primary target. | |||||
11 |
![]() |
203 | 22 | 13 | 9 |
12 |
![]() |
195 | 3 | 2 | 2 |
13 |
![]() |
161 | 18 | 11 | 10 |
14 |
![]() |
146 | 4 | 3 | 1 |
COMPOUND LINKS Read more about this on ChemSpider Download mol file of compound15 |
![]() |
146 | 2 | 1 | 1 |
16 |
![]() |
127 | 11 | 6 | 6 |
17 |
![]() |
121 | 48 | 27 | 14 |
18 |
![]() |
121 | 9 | 8 | 5 |
19 |
![]() |
121 | 23 | 8 | 7 |
20 |
![]() |
104 | 7 | 5 | 4 |
Frequency of occurrence of novel amines in this dataset may not necessarily reflect broad medicinal chemistry utility, especially where an entry is from a single organisation. Such examples may just reflect the patenting strategy in place for a given company (e.g. more comprehensive exemplification). Perhaps of more value are those reagents that appear not just across multiple patent applications, but from different companies, and crucially against a diversity of drug targets. While reagents arising from this analysis were assessed for synthesis against each of these parameters, ultimately it was an assessment of structural attractiveness by a team of medicinal chemists, in particular with a view to dis-similarity to existing internally available reagent sets, that was the principal selection criteria. Since most reagents derived in this manner should be target agnostic, even a reagent used a handful of times, by a single company against one target may ultimately prove of value in lead optimisation against unrelated targets, and could also potentially contribute to the resolution of unrelated medicinal chemistry issues.
This list of most used non-commercial amines is dominated by 6-ring analogues based around piperidine and piperazine. Indeed the meso-2,6-dimethylpiperazine moiety occurs twice (entries 11 and 17) capped by acyl and methyl groups. The first of these for example is disclosed in 22 WO patent applications from 13 different organisations, covering a diverse array of 9 targets/target classes, figures that infer its ability to impart favourable properties within medicinal chemistry design sets. Entries in the table which feature a high number of examples or patents but few targets are indicative that the amine in question may be more closely linked to the pharmacophore required for modulation, and as such may have lower utility as a reagent in general design. The piperidine entry 12 is one such example, where the COMPOUND LINKS
Read more about this on ChemSpider
Download mol file of compoundbenzylamine moiety is known to occupy and form key hydrogen bonds in the S1 pocket of βII tryptase.17 Similarly pyrrolidine-nitrile entry 14 is the key warhead in an established series of COMPOUND LINKS
Read more about this on ChemSpider
Download mol file of compoundDPP IV inhibitors,18 while novel COMPOUND LINKS
Read more about this on ChemSpider
Download mol file of compoundfluoropiperidine entry COMPOUND LINKS
Read more about this on ChemSpider
Download mol file of compound15 has been heavily exploited by Novartis as a P2 pocket ligand against thrombin.19 For reagents such as 12 to be of value as additions to novel reagent collections would of course require a suitable protecting group strategy be employed. In the work described below, we considered a tert-butoxycarbonyl mono-protection strategy for all unsymmetrical bis-amines of interest, and indeed against other reagent classes where the reactive functionality was incompatible with the presence of a base.
Table 1 highlighted COMPOUND LINKS
Read more about this on ChemSpider
Download mol file of compoundmorpholine as the key commercially available cyclic secondary amine exploited in bioactive molecules. In the field of kinase inhibition for example, morpholine is utilised both as basic solubilising side-chain such as seen in COMPOUND LINKS
Read more about this on ChemSpider
Download mol file of compoundgefitinib,20 and also key pharmacophoric hinge-binding motif as seen in an array of recent lipid kinase inhibitors.21 Through sub-structural searching of the dataset it is possible to gain insight into additional novel analogues that are being exploited in drug design that might be applied against alternative targets. Table 3 highlights the top 10 by frequency of this useful sub-structure in the filtered list. The most used analogue COMPOUND LINKS
Read more about this on ChemSpider
Download mol file of compound(2S)-2-(methoxymethyl)morpholine, entry 21 also occurs in (R)- and racemic forms (entries 28 and 25 respectively) making this an ideal candidate in any collection of reagents. Simple chiral dimethyl substituted analogues also feature (entries 22, 23 and meso-form 24).
Entry | Amine | Frequency | Entry | Amine | Frequency |
---|---|---|---|---|---|
21 |
![]() |
63 | 26 |
![]() |
11 |
22 |
![]() |
62 | 27 |
![]() |
9 |
23 |
![]() |
24 | 28 |
![]() |
8 |
24 |
![]() |
19 | 29 |
![]() |
8 |
25 |
![]() |
15 | 30 |
![]() |
8 |
Analysis of this data by frequency of occurrence, and specific sub-structural queries on amines of interest as outlined above is of value when defining strategies to optimise collections of reagents for use in design. Perhaps of greater value is an understanding of those reagents that show value across a diversity of drug targets, or those that are under exploitation by a range of organisations, even where overall frequency may be low. Table 4 illustrates a selection of novel, non-commercial amines found through this analysis which are either utilised by several companies, or against a range of targets, or both. Entry 31, a piperidine with an unusual pendant 1,1-difluoromethylene group has been exploited by a range of companies against a wide diversity of drug targets. The novel fluorinated COMPOUND LINKS
Read more about this on ChemSpider
Download mol file of compoundazetidine analogue entry 32, to date only disclosed by Vertex, has nevertheless been utilised by them in both kinase and GPCR programs. Entry 33 highlights a novel isostere of COMPOUND LINKS
Read more about this on ChemSpider
Download mol file of compound4-phenylpiperidine/piperazine incorporating a polar COMPOUND LINKS
Read more about this on ChemSpider
Download mol file of compoundphosphine oxide group utilised across several targets and research groups. Similarly, fluorinated pyrrolidine analogue entry 34 highlights a recurring theme in the extracted dataset, that of amines with pendant groups designed to modulate pKa, exploited here in ligands against kinases, integrases, proteases and ion channels.
Entry | Amine | Frequencya | Organisationb | Modulated Targets |
---|---|---|---|---|
a Frequency of substructure as extracted from SciFinder. This figure is greater than the frequency found by direct extraction from the database due to more comprehensive indexing and mirrors the data compiled in the remaining columns. b Excludes academic patent applications and literature. Patents from legacy (pre-merger) companies combined under current organisational structure where known. | ||||
31 |
![]() |
206 | Boehringer Ingleheim, Daichi-Sankyo, Epix, COMPOUND LINKS Read more about this on ChemSpider Download mol file of compoundHelicon, Paratek, Sanofi-Aventis |
Factor Xa, Fibrin, MAO-B, NCCa-ATP channel, Oxytocin receptor, COMPOUND LINKS Read more about this on ChemSpider Download mol file of compoundTetracycline antibiotics, Thrombin |
32 |
![]() |
61 | Vertex | AUR kinase, CGRP receptor, PLK kinase |
33 |
![]() |
18 | Merck & Co, Panacea Biotec, Pharmacyclics | BTK kinase, HDAC, COMPOUND LINKS Read more about this on ChemSpider Download mol file of compoundOxazolidinone antibiotics |
34 |
![]() |
19 | Eisai, Merck & Co, Pfizer | 11 bHSD-1, Carbonic anhydrase, DPP IV, FAK kinase, HIV integrase, Kv1.5, Squalene synthase |
35 |
![]() |
7 | BMS, Merck & Co, Pfizer, Roche, Tibotec | BACE, HIV protease, NK1/3, p53 |
It is perhaps informative to look at the context in which novel reagents such as these are incorporated into drug-like molecules. Fig. 2 highlights selected examples of the COMPOUND LINKS
Read more about this on ChemSpider
Download mol file of compound(3R,4R)-3,4-difluoropyrrolidine 34 as it occurs in the medicinal literature. This novel amine, whose synthesis is in fact disclosed22 has been heavily exploited by Pfizer23 against a range of diverse targets implicated in a variety of diseases such as cancer, infection and diabetes (entries 36 to 39 inclusive). Example 40, from Merck24 illustrates utility against potassium channel antagonists for cardiovascular therapy, an area also targeted with the squalene synthase inhibitor 41 from Eisai.25 Clearly in each case the primary pharmacology is driven by the scaffold to which the amine is appended, but its prevalence suggests an ability to contribute to the overall beneficial non-efficacy parameters required in any medicinal chemistry program.
![]() | ||
Fig. 2 Selected exploitation of COMPOUND LINKS Read more about this on ChemSpider Download mol file of compound(3R,4R)-3,4-difluoropyrrolidine 34 against a diverse set of drug targets by a range of companies. |
In reagent exploitation, certain reagent classes lend themselves more suitable to this approach, for example amines, acids and sulfonyl halides where the bond retrosynthesis is mostly unambiguous, than for others. Fragmentation of datasets for aldehydes for example, which are generally incorporated into target molecules via reductive amination to give alkylamines, may lead to reagents which in reality were derived via alternative chemistries. Nevertheless, as a tool to generate ideas for reagents to synthesise, the approach remains valid. Over the past several years, in conjunction with efforts to enhance screening collections,28 AstraZeneca has engaged in a comprehensive effort to improve the quality and scope of reagent collections used in in-house project and library synthesis using a diversity of approaches including the technique outlined here. Knowledge of heavily exploited but non-commercial reagents has been used to secure both exact analogues and as inspiration for design and synthesis of entirely novel reagents to aid internal medicinal chemistry optimisation. These efforts have, to date, contributed in excess of one thousand novel reagents for exploitation across all AstraZeneca small-molecule synthesis sites, and begun to demonstrate broad impact across all early phases of drug discovery, including application in candidate drug molecules.
This journal is © The Royal Society of Chemistry 2010 |