Anne Mai
Wassermann
and
Jürgen
Bajorath
*
Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr. 2, D-53113, Bonn, Germany. E-mail: bajorath@bit.uni-bonn.de; Fax: +49-228-2699-341; Tel: +49-228-2699-306
First published on 26th April 2011
Bioisosteres are generally defined as similar chemical groups whose replacement in active compounds retains their biological activity. As such, bioisosteric replacements are of high interest in medicinal chemistry and bioisosterism continues to be an intensely investigated topic. Herein we have investigated the previously unexplored question as to whether bioisosteric replacements can be identified for individual target families. A total of 40 protein families were analyzed. Through a systematic compound data mining effort, we have identified 67 replacements of chemical groups that qualified as bioisosteres for only one target family. These bioisosteric replacements included groups of rather different sizes and chemical compositions and were directed against a total of 12 different target families including, among others, very popular targets such as COMPOUND LINKS
Read more about this on ChemSpider
Download mol file of compoundtyrosine kinases or nuclear hormone receptors. A compendium of target family directed bioisosteric replacements is provided to aid in compound optimization efforts.
In addition to chemical experience and intuition, computational methods have also been utilized to identify or predict bioisosteric replacements including both knowledge-based and ab initio approaches.7,8 In a recent study, we have applied the Matched Molecular Pair (MMP) concept9 to systematically search for potential bioisosteres in publicly available active compounds.10 MMPs are defined as pairs of compounds that only differ at a single site and are distinguished by a defined substituent or molecular fragment. Thus, we adapted the MMP concept as a generally applicable and consistent structural reference frame for the search for bioisosteric replacements in active compounds. In our initial study, we have identified a set of 96 general bioisosteric replacements for different compound activity classes.10 A subset of these replacements was previously not considered as potential bioisosteres.
Thus far, bioisosteric replacements have essentially been generalized, i.e. they have been considered to be bioisosteric across different targets, assuming that they might be conservative with respect to different biological activities. In compound optimization, one typically considers bioisosteres on the basis of this premise. A question that has not yet been investigated, but that is also of high relevance for medicinal chemistry applications, is whether or not bioisosteric replacements can be found that preferentially act against a given target family. Considering the individual structural constraints that must be met to yield specific target–ligand interactions, it would perhaps not be unlikely that such replacements might exist, although they are currently unknown.
In order to address this question we have carried out a large-scale data mining effort to search for replacements that act as bioisosteres in individual target families. For this purpose, we have applied the MMP formalism and implemented a search strategy specifically designed to identify target family directed bioisosteric replacements. The results of our analysis are reported herein.
Ligand sets were assembled for targets for which at least five active compounds were available meeting our criteria, leading to the selection of a total of 25868 compounds organized in 342 individual target sets. Furthermore, the 342 targets were grouped into 40 different target families following the sequence-based family annotation of Uniprot12,13 as well as the protein classification hierarchy of the ChEMBL database (for further dividing G-protein coupled receptors into families). All 40 target families investigated in this study are listed in Table 1. The number of targets per family ranged from 3 to 35. From all compounds, hierarchical scaffolds were extracted.14
Target family | Source |
---|---|
a The 40 target families for which bioisosteric replacements were investigated are reported. Sources of target assignments to a given family are given under “Source”. | |
AGC Ser/COMPOUND LINKS Read more about this on ChemSpider Download mol file of compoundThr kinase |
UniProt |
Aldo/keto reductase | UniProt |
CAMK Ser/COMPOUND LINKS Read more about this on ChemSpider Download mol file of compoundThr kinase |
UniProt |
α-Carbonic anhydrase | UniProt |
Chemokine receptor | ChEMBL |
CMGC Ser/COMPOUND LINKS Read more about this on ChemSpider Download mol file of compoundThr kinase |
UniProt |
Cyclic nucleotide phosphodiesterase | UniProt |
Cytochrome P450 | UniProt |
Fatty-acid binding protein (FABP) | UniProt |
Glutamate-gated ion channel (TC 1.A.10.1) | UniProt |
Histone deacetylase | UniProt |
Lipid-like ligand receptor | ChEMBL |
Lipoxygenase | UniProt |
Metabotropic glutamate receptor | ChEMBL |
Monoamine receptor | ChEMBL |
MPI phosphatase | UniProt |
NOS | UniProt |
Nuclear hormone receptor | UniProt |
Nucleotide-like ligand receptor | ChEMBL |
Peptidase A1 | UniProt |
Peptidase C1 | UniProt |
Peptidase C14A | UniProt |
Peptidase M10A | UniProt |
Peptidase M12B | UniProt |
Peptidase S1 | UniProt |
Peptidase S9B | UniProt |
Phospholipase A2 | UniProt |
PI3/PI4 kinase | UniProt |
Potassium channel | UniProt |
Protein-COMPOUND LINKS Read more about this on ChemSpider Download mol file of compoundtyrosine phosphatase |
UniProt |
SDF symporter (TC 2.A.23) | UniProt |
Short-chain dehydrogenase/reductase (SDR) | UniProt |
Secretin-like receptor (B1) | UniProt |
Ser/COMPOUND LINKS Read more about this on ChemSpider Download mol file of compoundThr kinase |
UniProt |
Short peptide receptor | ChEMBL |
Sirtuin | UniProt |
SNF symporter (TC 2.A.22) | UniProt |
TKL Ser/COMPOUND LINKS Read more about this on ChemSpider Download mol file of compoundThr kinase |
UniProt |
Type-B carboxylesterase/lipase | UniProt |
COMPOUND LINKS Read more about this on ChemSpider Download mol file of compoundTyrosine-protein kinase |
UniProt |
Fig. 1 Matched molecular pair concept. An exemplary MMP is shown. The fragment that is exchanged between compounds forming a pair defines a transformation that relates the two MMP compounds to each other. The same MMP can be characterized by multiple transformations of different size. |
(1) For each transformation found in a target family compound set, all MMPs were assembled. MMPs corresponding to a transformation were required to be active against at least two targets of a family. The absolute logarithmic potency difference between compounds forming each MMP (“potency record”) was determined for all targets. Transformations were only considered if at least 20 potency records were available.
(2) MMPs representing a transformation were also required to contain at least five different scaffold pairs (hence, transformations had to occur in different structural contexts). In addition, if the target or the scaffold pair for which the largest number of potency records were available was removed from the analysis, at least 10 potency records had to remain.
(3) Potency differences within one order of magnitude were permitted for a transformation to qualify as a bioisosteric replacement. Only 1/15 of all available potency records were allowed to fall outside this potency range. Otherwise, the transformation was not further considered.
(4) If transformations of different size described the same MMPs, only the transformation describing the largest substructure was retained. However, if a smaller transformation was found in more MMPs than a larger one and if it also covered all compound pairs representing the larger transformation, then the smaller transformation was selected.
The assignment of transformations, MMPs, and potency records is illustrated in Fig. 2. All calculations reported herein were carried out with in-house generated Perl and Scientific Vector Language (SVL)17 programs and Pipeline Pilot18 tools.
Fig. 2 Assignment of MMPs and potency records to transformations. For each transformation occurring in the target family compound collection, all corresponding MMPs are assembled. The absolute logarithmic potency difference between compounds forming each MMP (“potency record”) is determined for all targets the MMP is active against. Here, three MMPs active against the nuclear hormone receptor family are defined by a “chlorine to fluorine” replacement and are grouped together. Their absolute logarithmic potency differences for the estrogen receptor alpha (ERα) and the estrogen receptor beta (ERβ) are also reported. In this example, five potency records for two different targets are obtained and the MMPs represent three different scaffold pairs. |
Potency variations as a consequence of transformations were largely (but not exclusively) limited to an order of magnitude. It should be noted that the analysis is cut-off sensitive. By considering smaller or larger potency variations, different numbers of target-family directed bioisosteres would be identified. However, in our opinion, a range of one order of magnitude is consistent with the basic idea of bioisosterism because a replacement resulting in a 100- or 1000-fold reduction in compound potency would hardly be regarded as bioisosteric. On the other hand, a ten-fold increase in potency as a consequence of a replacement would certainly be considered a favorable bioisosteric effect. Focusing our search on target families, we also required that a replacement had to occur in multiple ligands active against different family members and, in addition, that the potency record distribution was not strongly biased towards an individual target or chemotype.
Importantly, due to the general sparseness of currently available activity annotations, one would not be able to conclude with certainty that replacements meeting these selection criteria would be true bioisosteres for all targets belonging to a family, or that they could not act on a target belonging to another family. Therefore, replacements that ultimately met our criteria for only one target family were classified as target family directed (rather than family specific) bioisosteres.
Fig. 3 Bioisosteres directed at multiple target families. The 16 bioisosteric replacements that were identified for multiple target families are shown and annotated with their family assignments. |
TF | Abbr. | #Bioisosteres | #Targets |
---|---|---|---|
a Target families for which family directed bioisosteres were identified are listed in the column “TF” and are abbreviated (“Abbr.”). For each family the number of directed bioisosteres (“#Bioisosteres”) and the number of targets in the family (#Targets) are reported. | |||
Nucleotide-like ligand receptor | NLR | 22 | 5 |
Short peptide receptor | SPR | 19 | 35 |
Peptidase M10A | M10A | 5 | 9 |
COMPOUND LINKS Read more about this on ChemSpider Download mol file of compoundTyrosine-protein kinase |
TK | 5 | 30 |
Monoamine receptor | MAR | 5 | 34 |
α-Carbonic anhydrase | CA | 3 | 9 |
Peptidase S1 | S1 | 2 | 17 |
Nuclear hormone receptor | NHR | 2 | 18 |
SNF symporter (TC 2.A.22) | SNF | 1 | 5 |
AGC Ser/COMPOUND LINKS Read more about this on ChemSpider Download mol file of compoundThr kinase |
AGC | 1 | 14 |
Peptidase C1 | C1 | 1 | 6 |
Lipid-like ligand receptor | LLR | 1 | 20 |
Fig. 4 Target family directed bioisosteric replacements. (a) All 67 single target family directed bioisosteric replacements are shown and annotated with their family assignment. Bioisosteric replacements that involve exchanges of well-defined functional groups are marked in red. (b) Target family directed replacements are shown that would have qualified as bioisosteres for more than one target family if potency changes larger than one order of magnitude would have been accepted. The qualifying target family is abbreviated in red and other families that did not meet the potency criterion are shown and annotated with the relative frequency with which the transformation introduced potency changes larger than one order of magnitude. Abbreviations are used according to Table 2. SNF stands for the sodium neurotransmitter symporter family. |
Compound data mining efforts are generally affected by “data sparseness” (i.e., available compounds have not been tested on all families). Due to data sparseness, one cannot conclude with certainty that replacements identified for a single target family could not in principle act as bioisosteres on another target family. It is important to note that one can only extract information that currently available compound data provide. Therefore, we deliberately use the term “target family directed” (rather than “target family specific”) bioisosteres. However, in many instances, replacements that might, at first glance, look rather generic are indeed target-family directed because of the potency differences associated with them in different families. As an example, we consider the methyl to trifluoromethyl replacement that was found to be directed against the nucleotide-like ligand receptor family. This substitution was also frequently observed for monoamine receptors, COMPOUND LINKS
Read more about this on ChemSpider
Download mol file of compoundtyrosineprotein kinases, short peptide receptors, lipid like ligand receptors, and SNF symporters. However, for these families, the replacement often induced large potency differences. For these families, potency changes of more than one order of magnitude were observed with frequencies of 21.1, 19.2, 16.2, 13.3, and 9.7%, respectively. Hence, in these cases, the methyl to trifluoromethyl substitution did not qualify as a bioisostere.
This journal is © The Royal Society of Chemistry 2011 |