QSAR analysis of substituent effects on tambjamine anion transporters

A QSAR analysis of the transmembrane anion transport activity of 43 synthetic tambjamine analogs allowed rationalization of this activity according to their lipophilicity and structural parameters.


Introduction
The control of the transmembrane transport of ions is an essential function of living organisms. This control is essentially exerted by transmembrane proteins, although there are small lipophilic molecules (ionophores) capable of facilitating the transmembrane transport of ions. 1,2 The vast majority of identied natural ionophores are cation selective. Nevertheless, anion transport is no less important and the characterization of the facilitated transmembrane anion transport by both natural and synthetic systems is receiving increasing attention. [3][4][5][6][7][8] These molecules could have potential in the treatment of conditions derived from the defective regulation of chloride and bicarbonate transport such as cystic brosis or Bartter's syndrome. 9,10 Moreover naturally occurring cationophores nd applications as antimicrobials and biomembrane research tools, thus, anion selective ionophores could nd similar applications.
Among the identied naturally occurring anionophores, the structurally related prodiginines and tambjamine alkaloids are the most studied examples. 11 These compounds show interesting pharmacological properties including antitumor activity. 12,13 The synthetic prodiginine analogue obatoclax has been shown to display promising anticancer activity in the clinic. 14 We have demonstrated that the ionophoric activity of these compounds is related to their cytotoxicity. 15 Active ionophores are able to disrupt intracellular pH gradients and to trigger apoptosis in cancer cells. [16][17][18][19] An increasing number of synthetic molecules capable of facilitating anion transport by forming lipophilic supramolecular complexes or membrane spanning channels have been reported in the literature. [20][21][22][23] Despite this progress, the knowledge of the requirements for designing effective anion transporters remains poor, and identication of active derivatives is mostly based on trial/error methods. Qualitative structuretransport activity studies underscored lipophilicity as one of the most important factors inuencing the ionophoric transport activity of these compounds. 24 Moreover, Gale, Davis and coworkers have also introduced the concept of lipophilic balance in the design of these compounds. 25 Quantitative structureactivity relationship (QSAR) approaches are widely employed in medicinal chemistry. QSAR constitutes a powerful tool to assist rational molecular design and to predict different physicochemical properties. 26 Recently, we have reported a quantitative structure-transport activity (QSAR) study of the anion binding and transport of a series of 1-hexyl-3-phenylthioureas bearing various substituents at the para-positions of the aromatic ring. 27 This study allowed us to determine a statistically relevant model correlating anion transport activity with parameters such as lipophilicity, the Hammett coefficient of the varied substituent and SPAN, a descriptor for molecular size. Prompted by this success we decided to perform a more ambitious study introducing several structural changes on the studied molecules. We aimed to investigate a series of effective anion transporters having a range of lipophilicity values as well as transport activities. In this regard, the tambjamine alkaloids represent ideal candidates because of their synthetic accessibility and tolerance to different substituents while remaining as potent transmembrane anion transporters. In this work we present a QSAR study of the transmembrane anion transport activity of 43 tambjamine inspired transporters, aimed to shed light on the structural design requirements to successful anion carriers and the quantication of the relationships between lipophilicity and transmembrane anion transport activity of small molecules.

Results and discussion
A series of tambjamine derivatives 1-43 were selected for this study (Fig. 1). Tambjamines are marine alkaloids containing a 4-methoxy-2,2 0 -bipyrrole core. Some of the studied compounds are natural products such as tambjamine B (20), tambjamine C (31), tambjamine K (32) or BE-18591 (30), whereas others are synthetic tambjamine analogues. With this selection we aimed to create a library of compounds including systematic variations on the enamine substituent and also to explore the possibility of replacing the -OMe group characteristic of naturally occurring derivatives by a benzyloxy group. The synthesis of these compounds is straightforward from the appropriate bipyrrolealdehyde. 28 Compounds 5,9,[20][21][22][23][24][25][26][27][28][29][30][31][32]34,35,[37][38][39][40] have been previously reported and all of them were characterized by standard methods. 29 Anion transport assays In order to measure the transmembrane transport activity of compounds 1-43, the chloride efflux from 1-palmitoyl-2-oleoylsn-glycero-3-phosphocholine (POPC) chloride containing vesicles was monitored over time using a chloride selective electrode, according to reported methods. 30 Briey, 200 nm POPC liposomes containing chloride (489 mM NaCl, 5 mM phosphate buffer pH 7.2) were prepared. The vesicles were then suspended in an isotonic nitrate solution (489 mM NaNO 3 , 5 mM phosphate buffer pH 7.2) and the studied compound added as a DMSO solution (typically 10 mL or less to avoid any inuence in the outcome of the experiment). Chloride release is then monitored over 300 s using a chloride selective electrode. A nal reading, considered to be 100% chloride release, was obtained aer addition of detergent to lyse the vesicles. The transport assays were repeated at different carrier concentrations. The data was subjected to Hill analyses in order to obtain a quantitative measure of the transporter efficiency. 31 Thus the effective concentrations required to induce 50% of chloride efflux in the time scale of the experiments (300 s) were calculated (EC 50 , Table 1). Hill analyses also provided the Hill parameter n values. The Hill parameters were all consistent with a mobile carrier mechanism. 32 All the studied compounds were found to be highly active anion carriers, with EC 50 values of 0.003-0.346 mol% carrier/lipid. The initial rate of chloride release (k ini ) was also calculated for carrier loadings of 0.05 mol% compound to POPC. An overview of all these data is provided in Table 1.

Quantitative analysis of transmembrane anion transport
Quantitative structure-transport activity (QSAR) studies represent a commonly employed approach to modelling physical and biological properties of compounds. 26,33 This approach is a powerful tool for structure optimization and targeted design of new compounds. The objective of a QSAR study is the construction of a statistically relevant model. Using a combination of soware sources: ALOGPS 2.1 and e-dragon 1.0, 34,35 (which gave constitutional descriptors, topological descriptors, topological charge indices, geometrical descriptors, WHIM descriptors, charge descriptors and molecular properties), Chemicalize, 36 ACDiLabs 2.0, 37 TorchV10lite 38 and ChemBio-Draw 12.0 ultra soware 39 a total of 506 descriptors were calculated. Based on our previous observations, we identied lipophilicity as an important parameter determining the transmembrane transport efficiency of a given transporter. 27 In order to obtain an experimental measure of this property, the retention times (RT) of all compounds were measured using reverse phase HPLC. In this assay, lipophilic compounds show higher retention times whereas hydrophilic compounds are eluted more quickly. 40 These experiments are used as an indirect measure of the lipophilicity. On the other hand, log P, the octanol-water partition coefficient, is the more employed quantitative measure of lipophilicity. The importance of this parameter 41 in medicinal chemistry and drug discovery has led to the development of several soware packages to predict the log P values without the need of experimentally time consuming measures. Moreover, these predictions allow the calculation of log P values of virtual compounds. Simple correlations of the measured RT and the different calculated log P values showed an excellent agreement (see ESI, † log P_RT_correlations.pdf). 40 This correlation supported the validity of computationally obtained log P values for these compounds. The best correlation Table 1 Overview of transmembrane anion transport data: EC 50 , n, initial rate of chloride release (k ini ), log P and retention times a was found for the calculated ALOGPs values using the ALOGPs 2.1 soware, therefore ALOGPs descriptor was selected as the best log P descriptor. Those values are shown in Table 1 along with RT data. A simple plot of the transport activity, expressed as log(1/ EC 50 ), vs. ALOGPs or retention time (RT) suggested a parabolic dependence of these variables ( Fig. 2a and b). The rationale behind this observation is that there is an optimum compromise in the hydrophilicity/hydrophobicity balance, which maximizes the transmembrane transport activity of a given compound. 24 A too hydrophilic transporter would not partition into the phospholipid membrane whereas a too hydrophobic derivative would not be able to move away from the membrane core and thus act as a carrier. At the beginning of the modelling part of this study, a set of 38 compounds had been synthesized. However, the majority of these compounds were present in the middle of the explored ALOGPs range (values from 2-6) with only a few compounds above or below this range. Therefore, the need of including further compounds, having low and high log P values, to conrm this parabolic dependence and to avoid an excessive leverage of data corresponding to compounds displaying low activity and extreme log P values was evident. Compounds of a similar structure to the existing tambjamines were hypothesised and their ALOGPs values calculated. Those that fell in the ranges of 1-2.5 and 5-7.5 were considered suitable and suggested for synthesis. Thus, 5 additional tambjamine derivatives (numbers 19,33,36,41,43) were synthesised and measured (the additional molecules are highlighted by * in Fig. 2c). Attempts to nd simple correlations between the anion transport activity and the lipophilicity of tambjamine derivatives were not satisfactory. Therefore, it was evident that a more sophisticated analysis should be made.

Data cleaning
Prior to running any QSAR analyses the descriptor dataset was cleaned. Descriptors were removed if they were incomplete with values unavailable for some of the molecules, if the values were classed as non-numeric or if the descriptors had little or no variation across the dataset. Following the cleaning of the dataset, a total of 330 descriptors remained (see ESI, † Tambja-mines_dataset_cleaned.csv). The descriptor dataset still contained different calculated values of log P. Some descriptors are the square of another descriptor, e.g. ALOGPs-sq.

QSAR -stratied sampling and bootstrap
In the rst stages of the investigation, the initial dataset (38 compounds) was split into a training set and a test set using conventional QSAR methods, and attempts were made to validate a number of model ts using cross-validation techniques. The cross validation methods were not successful with this dataset. Although the dataset is of a reasonable size, splitting the dataset into a training and test set resulted in a test set only containing 6 compounds. Due to the parabolic relationship between log(1/EC 50 ) and log P and high leverage of the few molecules with high or low log P, the selection of the test set had an extremely large inuence on the validation statistics obtained. It is apparent that if the training set were to miss out even a few of the high and low log P molecules then the most reasonable t would simply be a line almost independent of log P.
To cope with the leverage of the high and low log P molecules, a stratied test set selection method was employed, ensuring that compounds were selected for the low, mid and high log P ranges. However, the size of the dataset and the relatively few molecules in the strata does not allow for much exibility in the selection. To minimise test set selection bias and maximize the information from all the molecules in the dataset, a bootstrap method was selected as a suitable method for validation of the model ts. Using the bootstrap package, boot, in R, 42,43 the data were sampled from the full dataset and the statistics calculated, using a resampling of the dataset 999 times. Comparing the condence intervals for the bootstrap t and the linear least squares prediction highlights the reasonable robustness of the ts.

QSAR models
The rst avenue that was explored was tting the whole dataset to one model. The full descriptor set was examined in JMP, 44 and using the stepwise t a 't all models' was run, modelling the log(1/EC 50 ) against the set of descriptors with a maximum of three parameters for the model (four parameters generated too  many models for the available computing power, four parameter models were generated with a subset of descriptors). The modelling considered ALOGPs and ALOGP-sq as lipophilicity descriptors. As described earlier, the ALOGPs descriptor was identied as the best log P descriptor through correlation with retention times (RT) (for full correlations see ESI † (log P_RT_correlations.pdf)).
The simple parabolic two parameter model (ALOGPs, ALOGP-sq) generates the following eqn (1) with an R 2 value of 0.629: Log(1/EC 50 ) ¼ À0.579 + 1.203ALOGPs À 0.133ALOGPs-sq (1) Increasing the number of parameters to three increased the R 2 value to approximately 0.79 for the top models. All the top 20 models have an R 2 value above 0.74. Summary information about the 10 best three-parameter models to the whole dataset is shown in Table 2, ranked by R 2 values (additional models can be seen in ESI †).
Following the 't all models' t, condence intervals were obtained for a selected number of models from the leastsquares analysis. These models were then also run through a bootstrap method in R to obtain condence intervals using a sampling method. Due to the distribution of the data still being heavily biased towards the middle of the ALOGPs range, we utilised a stratied selection within the bootstrap function to ensure that a selection of points from the lower and upper regions were always included.
Condence intervals obtained from the bootstrap function were well aligned with the condence intervals obtained from the linear t (Table 3) (see ESI † for additional details). This suggests that the ts are quite robust. The most variation comes in the coefficient for the intercept with a much narrower range in the   ALOGPs and ALOGPs-sq coefficients. However, plotting actual vs. predicted for the models gives a fairly similar appearance for all of the selection of ten models (see ESI † for details).
As shown by the models described in Table 2, there were a large number of calculated descriptors that seemed to offer potentially useful additional descriptive power to the ts, but without any clear advantage of one over the others (apart from the clear importance of log P). This suggested that principle component analysis and partial least squares analysis might be useful. However, this led to insignicant improvements in the models, and made the contributions of the terms in the models less clear. Therefore, we sought an alternative classication approach along the lines of partial decisions trees by modelling subsets of the compounds based on the structural features of the molecules.

Structural classication
The compounds in this series share a bipyrrole core structure, and the rest of the structure can be categorised by three variations on backbone structure (see Fig. 3). The R 4 position on the heterocycle (ring-substituent) is either occupied by an OMe group or by an OBn group, the R 5 position (enamine-substituent) is either an NH group or a NH-Ph moiety (with two exceptions: compound 19 is NH-CH 2 -Ph and compound 38 is NH-py), the R 6 substituent (R-group) is quite varied but can be grouped into the type of substituent e.g. alkyl, halogen, etc. The presence or absence of a structural feature is a key aspect which could have an effect on the activity of a molecule. Due to this we looked into separating the set of molecules into groups by the structural substituents.
Splitting by ring-substituent R 4 gives two groups: thirty-three compounds with a methoxy group and ten compounds with a -OBn substituent (Fig. 4a). Splitting considering R 5 group gives two main groups and two points that do not t into either the NH or NH-Ph classication. The NH group has nineteen compounds and the NH-Ph group has twenty-two compounds (Fig. 4b). Splitting by the R 6 group is fairly difficult as there are a variety of different substituents. The most populated group is that in which R 6 is an alkyl group, with twenty-eight compounds. The remaining een compounds t into six other groups (Fig. 4c).
The subset with the most interesting grouping involves the split by enamine-substituent R 5 (Fig. 4b). From plotting log(1/ EC 50 ) against ALOGPs (assuming a parabolic relationship) we have two sets of data where the peak log(1/EC 50 ) values appears to change between the two sets. However the optimum log P value appears to be similar for the two sets. The R-type plot shows a nice parabolic relationship for the R 6 alkyl R-type, however the other groups are not populated well enough to   7 Quadratic fits for all types of compound grouping, excludes groups with less than 3 points, showing behaviour consistent with a parabolic dependence on log P but with differing optimum values of log P suggesting that other aspects of the mechanism may be more significant in these cases. Groups are classified by the following substituents; R 4 .R 5 .R 6 (R-type). show a proper correlation. The reason for this is that in the NH group set the main substituent that is possible is an alkyl chain. On the other hand, with the phenyl ring in NH-Ph there is the opportunity to substitute a wider variety of R-types. Since there is only a substitution at the para position it limits the number of compounds that will have the same R-type substituent. Due to this we choose to take only those compounds with an alkyl substituent and carry out modelling of the subset using the lme4 package, 45,46 in R. This package allows us to use an entire dataset to t the curve of the parabola, whilst allowing the subset of data to adjust the positioning of the curve by changing the intercept. A linear mixed effect model (lmer) was run for the subset of the compounds containing an alkyl R-type, modelling the dataset to the form log(1/EC 50 ) ¼ a + b Â ALOGPs + c Â ALOGP 2 , and further splitting by the substituent R 4 . See Fig. 5.
Taking only the OMe ring substituted compounds (20 of the 28 alkyl compounds) results in the following lmer model and plot (Fig. 6).
These models show that extending a hydrocarbon tail certainly has the classic parabolic behaviour on log P with the optimum value of log P (and the curvature) being a property of the membrane (so similar for many of the subsets). The effect of the other substituent (OMe) and (OBn) in changing the maximum value of log(1/EC 50 ) is demonstrated but we are less clear what is driving this effect and this will be a subject for further investigation. Fig. 7 shows that by dening several sub-groups of substituents in terms of substituent location and chemical type we are able to demonstrate the parabolic dependence on log P and begin to highlight the aspects that are a property of the membrane and those that depend on more specic interactions between the membrane and the tambjamine molecules. The parabolic dependence observed is a property of the membrane. However, each substituent series is shied in optimal log P for transport. This evidence leads us to suggest that whilst for relatively simple substituents in certain locations on the tambjamine core, hydrophobic interactions dominate, for others more specic interactions are present that change the position of the membrane hydrophobicity parabolic envelope. The functions illustrated in Fig. 7 are presented in Table 4.

Conclusions
This study demonstrates the generality of lipophilicity as a crucial parameter governing the transmembrane transport activity of synthetic anionophores. Series of structurally similar compounds containing a common hydrogen bonding motif and a variety of substitution patterns can be grouped in subsets according to structural parameters. In general there is a parabolic dependence between log(1/EC 50 ) and log P which is a property of the membrane. By dening subgroups of substituents and splitting the data, optimum log P values for each subgroup were obtained. This suggests that for different subgroups of compounds specic interactions are taking place that change the optimum log P value. We have thus gained signicant insight into how substitution affects the anion transport properties of this important class receptor.