Multidimensional (3D/4D-QSAR) probability-guided pharmacophore mapping: investigation of activity profile for a series of drug absorption promoters

A. Bak*a, V. Kozikb, A. Smolinskic and J. Jampilekd
aDepartment of Organic Chemistry, Institute of Chemistry, University of Silesia, Katowice, Poland. E-mail: Andrzej.Bak@us.edu.pl
bDepartment of Synthesis Chemistry, Institute of Chemistry, University of Silesia, Katowice, Poland
cDepartment of Energy Saving and Air Protection, Central Mining Institute, Katowice, Poland
dDepartment of Pharmaceutical Chemistry, Faculty of Pharmacy, Comenius University, Bratislava, Slovakia

Received 17th June 2016 , Accepted 5th August 2016

First published on 5th August 2016


Abstract

In the current study a hybrid approach that combines 3D and 4D-QSAR methods based on grid and neural (SOM) paradigms with automated variable elimination IVE-PLS procedure was examined to identify the pharmacophore pattern for cholic acid derivatives as potential drug absorption promoters. In particular, the outcome of multidimensional structure–activity modelling of the transdermal penetration effect (SKIN) and intestinal absorption enhancement (PAMPA) using the classical CoMFA and Hopfinger's cube formalisms has been compared with the neural CoMSA and SOM-4D-QSAR methodology for a set of cholic derivatives. The comparison of the corresponding statistic characteristics generally confirms the previously observed trends in pairs of qcv2/qtest2 values where 3D/4D SOM-based protocols with a fuzzy molecular representation for various training/test subset distributions outperforms the standard cubic 3D/4D procedures. A systematic model space inspection with splitting data collection into training/test subsets to monitor statistical performance in the effort for mapping of the probabilistic pharmacophore geometry was conducted using the stochastic SMV procedure. The iterative variable elimination procedure (IVE-PLS) represents a filter for specifying descriptors having potentially the highest individual weightings for the observed potency of cholic acid analogues as drug absorption promoters. A simplified visual inspection of pharmacophore sites gives the clear picture of regions that might be modified to modulate the compound potency. A pseudo-consensus 3D/4D-QSAR methodology was used to extract an average 3D pharmacophore hypothesis by exploration of the most densely populated training/test subpopulations to indicate the relevant factors contributing to the drug absorption potency of cholic acid derivatives.


1. Introduction

The rational production of the desired compound pharmacological profile is an enormously challenging issue that still lacks a general approach.1 The computer-assisted drug design (CADD) is regarded as the art of specifying molecules of potential therapeutic value working as preliminary stage namely ‘pre-synthesis’ or ‘intuitive roadmap’ on the path towards the production of properties.2 The fundamental idea underlying the CADD for the robust identification of hit → lead → drug candidate is the comprehensive projection of the compound topology and/or topography into the chemical property space.3 A variety of CADD methodologies, in particular multidimensional quantitative structure–activity relationships (mD-QSAR) have been reviewed extensively.4 Traditionally, the pursued ‘Holy Grail’ of mD-QSAR modelling is to provide ultimately the statistically reliable models capable of making accurate quantity predictions including binding affinity, toxicity or pharmacokinetic parameters (ADMET) for compounds by observing the trends in chemical structure modifications that produce corresponding changes in the biological responses.5 Basically, the most classical QSAR procedures employ implicitly or explicitly the similarity principle where compounds with similar structure are expected to have similar biological activity.6 On the other hand, the diversity of inter/intramolecular phenomena involved in molecule–target interactions makes the optimization of the environmental response a cost/resources/knowledge demanding issue. Thus, vast attempts have been taken to implement the ‘direct’ receptor-dependent (RD) and ‘indirect’ receptor-independent (RI) in silico mD-QSAR protocols, respectively. In the devised target-guided QSAR approaches the complementary (bio)effector binding mode is modeled based on the intrinsic dependence of atomic coordinates of both receptor and ligand in the binding site while the macromolecular 3D geometry is available.7 The adopted spatial distribution of the ligand property space is mediated by the complementary mapping of target steric, electronic and/or lipophilic patterns. The promising site-directed QSAR methodology can only be applied when the target structural information, or at least good homology models are accessible along with reliable empirical interaction data of respective ligands.8 In absence of an experimentally determined receptor structure, primarily by X-ray and NMR crystallography, ligand-based techniques supply an ‘approximate estimation’ of the host binding affinities. Contrarily to the real biological environment (where only one ligand can bind at a time), a classical RI-QSAR study is based on an ensemble of structurally-related (bio)active molecules with approximately the same pharmacological pattern and binding ‘simultaneously’ to the real or hypothetical target.9 In consequence, the spatial arrangements of the crucial guest–host interaction patterns called pharmacophore mapping are provided composing the ‘reverse image’ of the target binding geometry. The pharmacophore concept is commonly specified as a geometric distribution of molecular features or molecular fragments forming a necessary but not sufficient condition for biological activity.10

A number of 3D-QSAR procedures have been implemented; however the comparative molecular field analysis (CoMFA) diffused quickly into medicinal and computational chemistry becoming a cornerstone for computer-aided molecular design. It is a long-established in silico technique basically employed as a tool for optimizing steric and/or electrostatic ligand patterns.11 The CoMFA approach characterizes the molecular features of superimposed molecules by the spatial distribution of non-covalent areas evaluated over lattice points which are correlated with variations of the biological response using partial least squares (PLS) method.12 The devised functions translate shape information into the form of spatially uniform maps that enable the prediction of the biological potency. Masking explicit shape information by the regularity of the cubic grid lattice and/or molecular surfaces as a conventional imitation of the molecular boundaries can be helpful in explaining pharmacological effects. A variety of methods evolved from this concept and a number of alternative CoMFA-like protocols have emerged, e.g., comparative molecular surface analysis (CoMSA) that implemented improvements in the molecular description, alignment rules and, subsequently, the modelling performance (predictive quality) as well.13 The fuzzification of the molecular representation which improves the compound superimposition seems to be of special interest due to a prospectively better description of the binding affinity to the putative receptor structure. Hence, CoMSA substitutes potential values specified at single points with mean potential values calculated for surface sectors.14 The applied competitive neural network (SOM) estimates the (dis)similarity of distinct molecules (template and counter-template) by comparing (unnecessarily identical) slices of molecular surfaces using the memorizing ability of neurons to assemble the patterns (vectors) located near the template attractor. The efficiency of comparative mapping has been proven in pharmacophore modelling or exploring the molecular diversity due to the robust neuron architecture enables to diminish the impact of the molecular superimposition mode; however the indeterministic nature of neural net imposes some restrains.15

The nature of the guest functional groups and the corresponding spatial distribution of properties in the binding site can be partially elucidated by a single 3D entity adopted by a ligand namely ‘bioactive’ conformation.16 Consequently, an obvious question arises about the preferable ligand geometry with the greatest complementarity/affinity as a ‘negative’ of an active target site when the empirical data are not available. Typically, the postulated initial ligand geometry is constructed based on the ‘sophisticated guess’ (not necessarily energy-minimized structure) mirroring the pharmacophore hypothesis which can be an erroneous prerequisite. In consequence, the higher level of model abstraction allowing for the examination of the multiple molecular conformation, orientation and protonation state representation has been proposed in the domain of rational drug design in the form of an additional dimension – the fourth one.17 In the broadest sense, 4D-QSAR methodology has evolved from the molecular shape analysis (MSA) by explicit enhancement of 3D-QSAR approaches, where the substitution of the single-conformer concept with cube-like structures of different resolution produces a fuzzy pattern of molecular objects.18 4D-QSAR analysis has proven to be reliable in the construction of quantitative 3D pharmacophore models in a function of molecular conformation, alignment and fragmentation (atom type).19 The resultant pharmacophore site is generated by extensive exploration of both conformational and alignment degrees of freedom to reduce the bias by selecting a (bio)active ligand conformer and binding mode using the conformational-search protocols e.g. molecular dynamic simulations (MDs).20 The population of conformational states for each analogue is sampled to generate molecular trajectory profiles. The 4D-QSAR formalism successfully incorporates some CoMFA characteristic employing the spatial grid, where the grid cell size is viewed as a ‘methodology parameter’ to produce a molecular shape spectrum (MSS) according to the trial alignment rule.21 The occupancy frequency of the unit cubes by individual atoms or even atom groups composing each molecule can be optionally augmented by the direct inclusion of the target data.22 Moreover, the neural version of Hopfinger's cube formalism, namely SOM-4D-QSAR, employing the Kohonen self-organizing maps has been applied to produce a fuzzy 4D-QSAR-like representation of the conformational space as an appealing alternative which performs comparably to its grid counterpart.23 In the proposed methodology a unit cube resolution used in the classical approach has been substituted by a sphere specified in space via single SOM neuron.24

An enormous number of trial conformers is generated during the MDs simulations while generally only a few spatial moieties contribute into the target response. Furthermore, the composite population of input descriptors assigned as ‘original databank’ generates dimensionality issue and diminishes the performance of QSAR modelling; therefore a hybrid approach that combines automated variable evaluation and reduction has been introduced to select important variables having the highest individual weightings to the observed bioactivity.25 In order to prune data noise the uninformative variables which do not contribute relevantly to the model should be excluded in advance, hence the iterative variable elimination procedure based on the partial least square procedure (IVE-PLS) represents a filter to recognize non-significant cell occupancies.26

A number of modern drugs are not available to patients due to their poor aqueous solubility and permeability.27 Generally, modification/optimization of poor permeability through membranes can be achieved by selection of appropriate excipients to function as transporters (surfactants or pharmaceutical complexing agents, permeability enhancers) being components of a dosage form. These excipients that increase absorption of drugs to blood circulation are known as intestinal absorption promoters in oral drug formulations and transdermal penetration enhancers in transdermal therapeutic systems.28 Numerous compounds of different chemical structures were evaluated/applied as absorption promoters.29–31

Cholic acid is one of the most important human bile acids. Bile acid derivatives/analogues are an important class of compounds with a range of pharmacological activities. Bile acids could be easily modified by derivatisation of the functional groups on the steroid nucleus.32 Nontoxic bile acid/salt derivatives (as amphiphilic compounds) are widely used in drug formulations as excipients – as a type of absorption promoters and thus they can influence gastrointestinal solubility, absorption and chemical/enzymatic stability of drugs.33 Cholic acid derivatives were studied also as transdermal penetration enhancers.34 The reason for their activity could be their specific solvation and self-assembly features.35

The principal objective of the current investigation was two-fold. First of all, it is of interest to compare the impact of the coding molecular systems on the efficiency of structure–activity performance using 3D (CoMFA and CoMSA), and 4D (standard and neural formalism) methods on the ensemble of drug absorption promoters, respectively. Irrespective of the variations in the manner the cholic acid derivatives were encoded both approaches performed comparably; however SOM-4D-QSAR yielded better results for the corresponding training/test sets. It seems that the robust neuron architecture decreases the influence of the compound superimposition mode by a fuzzy picture of molecular objects. The relationship between chemical structure of cholic acid derivatives as potential drug absorption modifiers and their enhancement effects are discussed. Additionally, we concentrated on systematic model space inspection with splitting data collection into training/test subsets to monitor statistical estimators performance in the effort for mapping of the probabilistic pharmacophore geometry using the stochastic model validation (SMV) approach.36 The automated variable reduction with the IVE-PLS procedure represents a sieve for detecting only those descriptors that have prescribed the greatest individual weighting for the observed cholic acids analogue activity. The ‘pseudo-consensus’ 4D-QSAR methodology was used to extract an ‘average’ 3D-pharmacophore by exploring various data subpopulations and which embodies the quantity for quality argument to indicate the relevant contributing factors of the cholic acid absorption activity.

2. Methods

2.1 Comparative molecular fields analysis (CoMFA)

CoMFA is a widespread approach that enables to simulate the impact of molecular shape upon modelling steric (Lennard-Jones) and electrostatic (Coulomb) effects as significant intermolecular interactions involved in non-covalent ligand–receptor binding.37 From the statistical perspective CoMFA procedure is based on the assumption that variations in binding affinities or biological activity profiles for a series of congeneric compounds can be explained by the comparative analysis of spatial patterns generated within the cubic mesh of points which encompasses aligned molecules by means of suitable probes. Therefore, the modelling efficiency of electronic and steric potentials in the molecular environment is directly controlled by the specification of the proper atomic probe (usually positively charged sp3 carbon atom) and superimposition applied following the putative pharmacophore hypothesis. It is clear that the distribution of potential values on the grid points depends on the atomic coordinates of the molecule which is a decisive factor affecting the value of the atomic partial charges. The potential energies at each lattice point can be plotted as three-dimensional coloured contour maps indicating the regions where steric hindrance and/or charged substituents enhance or diminish the binding affinity.38

2.2 Comparative molecular surface analysis (CoMSA)

A self-organizing neural network (SOM) is a data processing unit based on unsupervised learning methodology and two-dimensional topology comprised of a single neuron layer typically arranged into a hexagonal or rectangular array of nodes. The mutual relationships among neurons are precisely defined by winning and neighborhood distances, respectively. The presented m – dimensional input vector xs = (xs1, …, xsm) is distributed between neurons according to the similarity/correlation weight criteria, where similar inputs are located in the same or proximal nodes. A typical competitive Kohonen (KNN) procedure is based on the comparison of the input vector to the corresponding m – element weight vectors wi = (wi1, …, wim) that describe each neuron in order to detect the winning one (outC) into which a particular input will be projected. A formal criterion for the specification of the winning neuron is selected by the optimization of the Euclidean distance between a vector (x) and a weight (w):
 
image file: c6ra15820j-t1.tif(1)
where: j = 1, …, n – refers to a particular neuron, n – refers to the number of neurons, m – is the number of weights per neuron and s – indicates a particular input.

The weights of the winning neuron and the neighboring neurons are then modified to resemble and subsequently attract similar input vectors. The whole procedure is repeated while the next input vector is being presented to the network.39 In fact, self-organizing neural mapping is regarded as a nonlinear projection tool, which reduces the dimensionality of the input object, e.g. converts 3D objects to 2D, while preserving the topological relationships between the input and output data. Hence, the trained network can be applied for the projections of the specified molecular property prescribed to the input vector with the generation of the planar colour-coded clustering pattern, namely a feature map. Consequently, the SOM algorithm was used to construct a two-dimensional topographic map obtaining the signals from the points sampled randomly at the molecular surface such as an electrostatic potential map.40 In such application each 3-dimensional input vector consisting of x, y, and z coordinates is compared to a 3-element weight vector describing each neuron to find the closest neighbor and then project a signal into this particular neuron. The knowledge concerning the shape of the certain molecular surface (template) encoded in the weights of the trained Kohonen network can be used for processing the signals coming from the surface of other molecule(s) (counter-template) providing a series of comparative SOM maps to compare/contrast the superimposed molecular geometry. A variety of SOM applications for the classification, visualization and compression of the structural data has been described in chemistry, in particular for two-dimensional mapping of the electrostatic potential on the three-dimensional molecular surfaces or partial atomic charges for the atomic molecular representation. Moreover, we implemented a series of QSAR methods for pharmacophore mapping based on such comparative Kohonen strategy including CoMSA and SOM-4D-QSAR.41

2.3 4D-QSAR strategy

Hopfinger's 4D-QSAR grid coding system using a set of cubic shape-like descriptors is the enhancement of the 3D approaches by considering the multiple conformational states as the additional dimension.42 The method has been successfully applied in molecular modelling and is especially well suited for the search of the active conformation and binding mode of conformationally flexible molecules.43 The geometry-optimized molecules are used as initial structures for generating the compound trajectory in the molecular dynamics simulations (MDs), resulting in the conformational ensemble profile (CEP). Individual conformers are superimposed following active analogue approach (AAA) hypothesis and placed in the reference lattice space. In fact, the activity of drugs was calculated in the absence of the explicit solvent molecule, therefore MDs calculations were performed using a distant-dependent dielectric function, εr = Drij, which was set to 3rij to model the solvent effects. A series of so-called grid cell occupancy descriptors (GCODs) is calculated on based on the pattern in which the singular cubic cell resolution is set to 1, 2 and 0.5 Å, respectively. The group of Hopfinger's GCODs were considered for the atoms indicated as the interaction pharmacophore elements (IPEs). Moreover, the additional set of GCODs, namely charge descriptors, was employed in the calculation with the absolute charge occupancy (Aq) for the chosen IPE of compound c defined as:
 
image file: c6ra15820j-t2.tif(2)
where m is the number of the atoms of compound c present in the cell (i, j, k) at time t, q/m is the mean value of partial atom charges present in some cells at time t, T is the time in MDs, and N is the number of sampling MDs steps. The joint (Jq) and self-charge occupancy (Sq) descriptors with the most active reference compound R were determined as:
 
image file: c6ra15820j-t3.tif(3)
 
image file: c6ra15820j-t4.tif(4)

Each alignment produces a unique grid cell occupancy/charge distribution for a given molecular trajectory. The grid cells are unfolded into vectors and subsequently the array composed of vectors describing all molecules is generated and used to estimate the structure–activity relationship with PLS method coupled with variable selection/elimination procedures.

2.4 SOM-4D-QSAR formalism

Alternatively to the classical Hopfinger's strategy, a neural formalism with self-organizing Kohonen network for production of a fuzzy 4D-QSAR-like representation of the conformational space was employed.3 The competitive SOM algorithm was applied to generate a two-dimensional topographic map representing the signals from particular atoms of the molecular trajectory.23 In such an approach a sphere specified in space by a single neuron substitutes a singular unit cube. Briefly, the SOM-4D-QSAR procedure is composed of the following operational stages:
Stage 1. Model building – generation of a 3D structure of each molecule in the set analysed. In practice, each of the 3D structures might initiate the conformational sampling; however Sybyl-X/Certara software was used for structure energy-minimization.
Stage 2. Superimposition – specification of the trial alignment of the molecule conducted in molecule's individual atoms to be covered.
Stage 3. Interaction Pharmacophore Elements (IPE) – the molecules are partitioned into groups of atoms playing a privileged role in the modeled interactions, e.g., aromatic, hydrogen bond donors, hydrogen bond acceptors, polar positive or negative partial charge, and unrestricted (all) atom type.
Stage 4. Conformational Ensemble Profile (CEP) – the MD simulations of molecular system are performed for sampling conformers used in the further comparative analysis. The energy-minimized structures were used as the initial step to generate molecular trajectory for which the partial atomic charges were computed using the semiempirical AM1 method implemented in HyperChem 6.03 package.
Stage 5. Comparative SOM mapping – the Cartesian coordinates and partial atomic charges are used as the input to produce a two-dimensional topographic map. During training these data are distributed among neurons providing the sum occupancy or mean charge maps.
Stage 6. Variable reduction and model validation – a structure–activity relationship is modeled using the PLS procedure and LOO CV coupled with the IVE-PLS scheme for variable elimination. External model validation is additionally monitored by measuring the predictive ability of the external test set. A variety of training/test sets samplings has been monitored by the iterative Stochastic Model Validation (SMV) scheme.

2.5 PLS analysis

The partial least squares (PLS) method produces a relationship between variable Y and an ensemble of descriptors X expressed in the form of the following equation:
 
Y = Xb + e (5)
where b = a vector of the regression coefficients and e = a vector of the errors. The generated data were centered and processed by the PLS analysis with the construction of models the complexity of which was estimated using the leave-one-out cross-validation procedure (CV). In the leave-one-out CV one repeats the calibration m times, each time treating the i-th left-out object as the prediction object.44 The dependent variable for each left-out object is calculated based on the model with one, two, three, etc. factors. The root mean square error of CV for the model with j factors is defined as:
 
image file: c6ra15820j-t5.tif(6)
where obs denotes the assayed value; pred is the predicted value of dependent variable and i refers to the object index, which ranges from 1 to m. A model with k factors, for which RMSECV reaches a minimum, is considered as an optimal one.

A cross-validated leave-one-out qcv2 value for the estimation of the model performance is computed using the following formula:

 
image file: c6ra15820j-t6.tif(7)
where obs is the assayed value; pred is the predicted value; mean is the mean value of obs, and i refers to the object index, which ranges from 1 to m; and cross-validated standard error of prediction s:
 
image file: c6ra15820j-t7.tif(8)
where m is the number of objects and k is the number of the PLS factors in the model.

The quality of external predictions was measured by the standard deviation of error of prediction (SDEP) and qtest2 defined as:

 
image file: c6ra15820j-t8.tif(9)
 
image file: c6ra15820j-t9.tif(10)
where n is the number of objects in test set.

2.6 Iterative PLS-based variable elimination

The redundant variables may affect the regressive analysis detrimentally; therefore the reduction in the number of the uninformative descriptors can facilitate the model interpretation considerably. Although variable elimination in PLS modelling is a complex issue several methods have been described recently, including the uninformative variable elimination (UVE-PLS), as well as its modifications, namely modified UVE (m-UVE) and iterative variable elimination (IVE-PLS) that were successfully applied both in 3D and 4D-QSAR schemes, respectively.26 The original UVE algorithm developed by Centner et al. analyses the reliability of the mean(b)/s(b) ratio, where s(b) is a standard deviation of regression coefficient b calculated by the PLS method.45 Then, only variables of the relatively high stability values are included into the final PLS model. In the current calculations iterative IVE-PLS procedure is used as a modification of the single-step UVE algorithm for identification of variables to be eliminated. Basically, the whole algorithm is composed of the following stages:
Stage 1. Standard PLS analysis with LOO-CV to evaluate the performance of the PLS model (qcv2).
Stage 2. Elimination of the matrix column with the lowest abs(mean(b)/std(b)) value.
Stage 3. Standard PLS analysis of the new matrix without the column eliminated in stage 2.
Stage 4. Iterative repetition of the stages 1–3 to maximize the LOO qcv2 parameter.

3. Experimental

3.1 Model builder

All pharmacological data were specified in the same laboratory to eliminate potential data noise that might have been introduced by pooling of data sets coming from various sources. All the target compounds were tested for their in vitro transdermal penetration activity (SKIN) and as potential absorption enhancers (PAMPA). The chemical structures and experimental data for the cholic acids derivatives were extracted from the literature and are introduced in Table 1.32–34 The CACTVS/csed molecular editor was used to specify the constitution of the respective compound models. The spatial geometry of molecules was produced using the 3D generator CORINA. The (inter)change file format converter OpenBabel was applied to convert the chemical data.
Table 1 Structures of the cholic acid derivatives and experimental transdermal penetration (SKIN) and intestinal absorption (PAMPA) enhancement activity32–34
image file: c6ra15820j-u1.tif
L.p. Structure ERs
SKIN PAMPA
1 image file: c6ra15820j-u2.tif 1.67 1.08
2 image file: c6ra15820j-u3.tif 1.57 1.18
3 image file: c6ra15820j-u4.tif 1.67 1.16
4 image file: c6ra15820j-u5.tif 2.13 0.35
5 image file: c6ra15820j-u6.tif 1.3 0.6
6 image file: c6ra15820j-u7.tif 1.24 1.03
7 image file: c6ra15820j-u8.tif 2.09 0.73
8 image file: c6ra15820j-u9.tif 1.83 0.73
9 image file: c6ra15820j-u10.tif 1.08 0.95
10 image file: c6ra15820j-u11.tif 0.92 1.21
11 image file: c6ra15820j-u12.tif 1.45 0.93
12 image file: c6ra15820j-u13.tif 2.01 0.52
13 image file: c6ra15820j-u14.tif 1.96 0.86
14 image file: c6ra15820j-u15.tif 2.01 0.8
15 image file: c6ra15820j-u16.tif 2.63 2.45
16 image file: c6ra15820j-u17.tif 1.54 2.66
17 image file: c6ra15820j-u18.tif 2.5 2.75
18 image file: c6ra15820j-u19.tif 2.41 2.72
19 image file: c6ra15820j-u20.tif 2.28 2.86
20 image file: c6ra15820j-u21.tif 2.36 2.68
21 image file: c6ra15820j-u22.tif 2.01 2.23
22 image file: c6ra15820j-u23.tif 1.2 2.67
23 image file: c6ra15820j-u24.tif 1.99 2.53
24 image file: c6ra15820j-u25.tif 2.26 2.23
25 image file: c6ra15820j-u26.tif 1.61 2.52
26 image file: c6ra15820j-u27.tif 1.85 2.36
27 image file: c6ra15820j-u28.tif 2.22 2.29
28 image file: c6ra15820j-u29.tif 1.7 2.69
29 image file: c6ra15820j-u30.tif 2.08 2.66
30 image file: c6ra15820j-u31.tif 2.53 2.02
31 image file: c6ra15820j-u32.tif 2.24 2.55
32 image file: c6ra15820j-u33.tif 2.52 2.47
33 image file: c6ra15820j-u34.tif 1.65 1.16


3.2 Molecular modelling

The major part of the modelling studies were performed using the Sybyl-X 2.0/Certara software package running on HP workstation with Debian 6.0 operating system. The initial geometry of each analogue in its neural form was optimized using standard Tripos force field (POWELL conjugate gradient algorithm) with 0.01 kcal mol−1 energy gradient convergence criterion and a distant dependent dielectric constant. Initially, partial atomic charges were calculated using the Gasteiger–Hückel method implemented in Sybyl for electrostatic potential calculations. The trial alignments are typically defined to systematically span the common scaffold of the analysed compounds; therefore one 17-ordered atom trial alignment on molecule 19 was selected to cover the entire bonding topology in the maximal common structure (steroid ring system) by the atom FIT method based on the matching of atoms' positions between the corresponding atom pairs (1–17), as presented in Table 1.

The steric and electrostatic field energies were calculated using sp3 hybridized carbon probe atom with a charge of +1 and 0 and hydrogen as a probe atom with a charge of +1, respectively. The CoMFA grid spacing was 2.0 Å for all the Cartesian dimensions within the defined region of 3D lattice, which extended beyond the van der Waals envelopes of all molecules by at least 4.0 Å. The non-covalent interaction fields were determined at each intersection on regularly spaced grid. For each molecule the energies with a total of 10[thin space (1/6-em)]560 grid points were calculated with 2 Å spacing in a 20 × 22 × 24 lattice. To reduce data noise, all columns with energy variance less than 2.0 kcal mol−1 were discarded by setting the sigma parameter to 2.0 kcal mol−1. Both steric/electrostatic energies with a value greater than 30.0 kcal mol−1 were truncated to a tentative value of 30.0 (default cutoff). The variations in CoMFA interaction fields (independent variables) were correlated with the changes in activity (dependent variable) with standard internal and external validation techniques to specify the statistical index of the model predictive power using the SAMPLS method.

The SONNIA software was applied in CoMSA analysis to simulate 20 × 20 or 30 × 30 SOMs with winning distance (md) varied in the range of 0.2–2.0. The SOM network was fed with the Cartesian coordinates of the molecular surfaces for superimposed molecules to form a two-dimensional map of electrostatic potential – the most active analog in each series of compounds (19) was used to form the template molecule. The output maps were subsequently transformed to a 400- or 900-element vector which was processed by PLS method implemented in MATLAB programming environment, accordingly to CoMFA analysis.

Energy-minimized molecules (MAXMIN2 module implemented in Sybyl) were used as the initial structures in the molecular dynamic simulations (MDs) with the standard Tripos force field. Each 3D structure is a starting point in generating the conformational ensemble profile (CEP). The CEP is created from a MDs run of 100 ps generated at intervals of 0.001 ps time step. The temperature for the MDs was normally set to 300 K. The atomic coordinates of each conformation and its total energy were recorded every 0.1 ps. One thousand conformations were sampled systematically (one out of hundred) for each analogue in CEP based on the generated 100[thin space (1/6-em)]000 trajectory states. Partial atomic charges were estimated with the semiempirical AM1 Hamiltonian implemented in HyperChem 6.03 software. Both frequency and charge-related GCOD descriptors were determined for a 0.5, 1.0 and 2.0 Å resolution of the grid cell lattice with the most active compound 19 as a reference, respectively. The best QSAR models were selected on the basis of the statistically relevant parameters obtained by means of the PLS procedure. Moreover, the resultant MDs trajectories were processed by neural network with the arbitrary chosen molecule 19 as a template to train a set of SOM maps. The SONNIA package was employed in SOM-4D-QSAR simulations to produce 20 × 20 or 30 × 30 SOMs with winning distance (md) varying in the range of 0.2–2.0, respectively. Correspondingly to CoMSA, the output arrays are reshaped to a 400 or 900 element vector processed by the PLS procedure.

4. Results and discussion

4.1 Modelling of drug absorption modifying potency using 3D and 4D approaches

The insufficient penetration of the active pharmaceutical substances through the skin barrier and/or membranes can be improved by employing appropriate excipients to provide the transporter component of a dosage form.46 Absorption promoters can increase gastrointestinal solubility and, consequently, the transfer of drugs to blood circulation (intestinal absorption modifiers) or enhance transdermal penetration of the therapeutic systems (transdermal enhancers). A series of cholic acid derivatives were evaluated experimentally for their in vitro transdermal penetration effect and as potential intestinal absorption enhancers.47,48 The detailed description of the empirical procedure is beyond the scope of this manuscript but is presented elsewhere.32–34 The enhancement ratios (ERs) of cholic acid analogues with respect to the influence on membrane permeability, which were studied in the PAMPA experiment as well as the enhancement of the of theophylline through porcine skin, as determined by means of Franz cell, are presented in Table 1.

One objective of this study was to systematically investigate the 3D and 4D-QSAR performance in a model of transdermal penetration effect and intestinal absorption enhancement observed for the set of cholic acid derivatives. It should be emphasized that we did not concentrate on details of each modelling method, but more on the philosophy of molecular object description; therefore the impact of coding system on the efficiency of 3D/4D-QSAR modelling was investigated. Furthermore, we examined the pharmacophore properties of a target series using coupled neural network and PLS method with the variable elimination IVE procedure. We compared the results of modelling using the standard 3D/4D-QSAR methodologies and their neural counterparts, respectively. Table 2 shows the qcv2 performance for the entire cholic acid dataset in training subset range from 0.62 to 0.81 for PAMPA models vs. 0.46 to 0.56 for SKIN modelling, depending on the type of descriptors used. It is noticeable that 3D methods (CoMFA/CoMSA) perform comparably (qcv2 > 0.8 for PAMPA potency), whereas 4D approaches produce slightly inferior results. Moreover, the quality of obtained models in term of qcv2 values is better for charge descriptors (q) compared to occupancy ones (o) suggesting the importance of the (co)factor specifying the electrostatic field distribution in the shape description. Interestingly, an extension of the descriptor pools with molecular lipophilicity evaluated with the calculated log[thin space (1/6-em)]P value does not have a noticeable impact on the modelling outcomes. The exclusive reliance on the training set is inadvisable to determine the robustness and the predictive ability of models; therefore apart from the internal validation with the cross-validation procedure the external model validation with splitting the molecule collection into a set of training/test subsets has been performed as well. A low qcv2 value for the training set is a sufficient indicator of the low predictive ability of a model. On the contrary, a good model fit alone is a necessary, but not indicative parameter characterizing model robustness and reliability. Consequently, some external predictions were evaluated using SDEP and qtest2 statistics.5 To estimate the model's predictive power the original dataset was divided arbitrarily into training/test subsets in 2[thin space (1/6-em)]:[thin space (1/6-em)]1 ratio (22/11) ranked according to the values of intestinal absorption (PAMPA) and skin penetration (SKIN) activity to create representative molecular subsets as follows: (1, 2, 4, 6, 7, 8, 11, 12, 13, 16, 17, 18, 20, 21, 22, 25, 26, 27, 30, 31, 32, 33)/(3, 5, 9, 10, 14, 15, 19, 23, 24, 28, 29) and (1, 2, 4, 5, 6, 7, 8, 9, 10, 13, 14, 16, 18, 20, 21, 23, 24, 28, 30, 31, 32, 33)/(3, 11, 12, 15, 17, 19, 22, 25, 26, 27, 29), respectively. Table 3 illustrates the comparison of 3D vs. 4D-QSAR performance in training and test compound sets. It is worth noting that parameters of PAMPA models are generally superior to SKIN ones – the same tendency was observed for the entire set of compounds. Basically, in all cases the qcv2/qtest2 outcome indicates a comparable efficiency in modelling drug enhancement activity; however still 3D methods produce a slightly better findings (0.73/0.87 CoMSA and 0.69/0.81 CoMFA for PAMPA activity). Generally, the predictive ability for the test set (qtest2) is more or less at the same level as the qcv2 value, especially for PAMPA models. On the other hand, the exclusion of 11 compounds from the training set has small, but noticeable impact on the qcv2 characteristics – for all approaches the decrease amounts to approximately 0.1, respectively. Additionally, the Kennard–Stone algorithm was employed on dependent variables to divide representatively the data collection into training/test subgroups.49 The impact of the training/test set distribution with Kennard–Stone procedure on the statistical characteristics of the 3D and 4D approaches is provided in Table 4. The comparison of the corresponding models generally confirms the previously observed trends in pairs of the qcv2/qtest2 values where 3D/4D neural methodology with a fuzzy molecular representation for various training/test subset distribution outperforms the standard 3D/4D procedures e.g., 0.79/0.66 CoMSA vs. 0.75/0.64 CoMFA and 0.66/0.60 SOM-4D-QSARq vs. 0.49/0.61 4D-QSAR-Jq. Interestingly, the charge descriptors show better effectiveness in resolving the cholic acid activity which suggests that electrostatic effects are equally important as shape factors in modelling drug absorption promoters' activity. These findings promote the electrostatic component as potentially significant in enhancing drug absorption potential; however, the impact of the steric properties might be explained by the fact that the molecular shape is a crucial factor determining the distribution of atomic partial charges.

Table 2 Comparison of 3D and 4D-QSAR models for set of cholic acid analoguesa
L.p. Model PAMPAb SKINc
qcv2(onc) s qcv2(onc) s
a (onc) – optimal number of components.b Model: training set (1–33).c Model: training set (1–33).d Compound 19 was used as reference compound R.e Compound 15 was used as reference compound R.f Map size 30 × 30, md = 1.2.g Map size 30 × 30, md = 1.0.h Box 50 Å:50 Å:50 Å, cubic size 1 Å.i Map size 20 × 20, md = 2.0.j Map size 20 × 20, md = 0.2.
1 CoMFA 0.81(6)d 0.40 0.55(5)e 0.34
2 CoMSA 0.81(7)d,f 0.42 0.56(6)e,g 0.33
3 4D-QSAR-Jq 0.62(7)d,h 0.59 0.46(6)e,h 0.37
4 SOM-4D-QSARq 0.71(5)d,i 0.50 0.50(10)d,j 0.38


Table 3 Comparison of 3D and 4D-QSAR models for training/test series of cholic acid analoguesa
L.p. Model PAMPAb SKINc
qcv2(onc) s SDEP qtest2 qcv2(onc) s SDEP qtest2
a (onc) – optimal number of components.b Model: training set (1, 2, 4, 6, 7, 8, 11, 12, 13, 16, 17, 18, 20, 21, 22, 25, 26, 27, 30, 31, 32, 33) and test set (3, 5, 9, 10, 14, 15, 19, 23, 24, 28, 29).c Model: training set (1, 2, 4, 5, 6, 7, 8, 9, 10, 13, 14, 16, 18, 20, 21, 23, 24, 28, 30, 31, 32, 33) and test set (3, 11, 12, 15, 17, 19, 22, 25, 26, 27, 29).d Compound 19 was used as reference compound R.e Compound 15 was used as reference compound R.f Map size 30 × 30, md = 1.2.g Map size 30 × 30, md = 1.0.h Box 50 Å:50 Å:50 Å, cubic size 1 Å.i Map size 20 × 20, md = 2.0.j Map size 20 × 20, md = 0.6.
1 CoMFA 0.69(10)d 0.67 0.37 0.81 0.50(4)e 0.36 0.35 0.33
2 CoMSA 0.73(6)d,f 0.53 0.30 0.87 0.53(4)e,g 0.35 0.32 0.41
3 4D-QSAR-Jq 0.53(10)d,h 0.82 0.64 0.53 0.31(10)e,h 0.52 0.32 0.50
4 SOM-4D-QSARq 0.67(5)d,i 0.57 0.53 0.66 0.36(2)d,j 0.38 0.43 0.30


Table 4 Comparison of 3D and 4D-QSAR models for training/test series specified by Kennard–Stone procedure for set of cholic acid analoguesa
L.p. Model PAMPAb SKINc
qcv2(onc) s SDEP qtest2 qcv2(onc) s SDEP qtest2
a (onc) – optimal number of components.b Model: training set (1, 2, 4, 5, 6, 7, 9, 10, 12, 13, 14, 15, 16, 17, 18, 19, 21, 25, 26, 27, 30, 31) and test set (3, 8, 11, 20, 22, 23, 24, 28, 29, 32, 33).c Model: training set (2, 4, 5, 6, 9, 10, 11, 12, 13, 15, 16, 18, 19, 20, 22, 25, 26, 28, 29, 31, 32, 33) and test set (1, 3, 7, 8, 14, 17, 21, 23, 24, 27, 30).d Compound 19 was used as reference compound R.e Compound 15 was used as reference compound R.f Map size 30 × 30, md = 1.2.g Map size 30 × 30, md = 1.0.h Box 50 Å:50 Å:50 Å, cubic size 1 Å.i Map size 20 × 20, md = 2.0.j Map size 20 × 20, md = 0.2.
1 CoMFA 0.75(2)d 0.47 0.48 0.64 0.47(9)e 0.47 0.24 0.27
2 CoMSA 0.79(9)d,f 0.53 0.45 0.66 0.50(5)e,g 0.40 0.26 0.07
3 4D-QSAR-Jq 0.49(8)d,h 0.80 0.53 0.61 0.32(10)e,h 0.56 0.30 0.15
4 SOM-4D-QSARq 0.66(10)d,i 0.70 0.55 0.60 0.33(10)d,j 0.56 0.23 0.34


4.2 Stochastic model validation

The cross-validated leave-one-out (CV-LOO) qcv2 is a statistical quality metric frequently used as a basic criterion to evaluate the model fitness; however a question arises about the relevance of the model predictive power which cannot be assessed just by the goodness of data fitting. In fact, the statistically reliable QSAR models can be established by extrapolation of their knowledge to external objects not included at the stage of the model construction. Not surprisingly, QSAR models predict better for objects in the trainings set used at the modelling step than for molecules in the test set since no structure–activity information is accessible. Nevertheless, a model with predictive capability should by all means characterize SAR relations inherent in a test set, hence it is not a trivial issue how to separate objects into training/test subgroups having a restricted set of molecules for which activity data are available a priori. Moreover, a question appears whether we can differentiate between modelling and the predictive ability in QSAR modelling since the quality of models is significantly dependent on the classification of molecules into training/test subpopulations. The high value of qcv2 indicator (qcv2 > 0.5) is not an ultimate proof and does not automatically imply a high predictive ability of the model; however it is a necessary but not sufficient condition to establish robust models.6 Accordingly, Stochastic Model Validation (SMV), an additional inspection of the QSAR performance, has been recommended. Naturally, it is of interest if this ‘perturbation’ procedure could provide any ESI regarding the data structure and model robustness. In consequence, the repetitive sampling of the original compound dataset (33 compounds) into training/test series each containing 22/11 molecules (fraction 2/3 to 1/3) has been iteratively performed to assess fluctuations of the statistical estimators. Due to the resource and time restraints the evaluation of all possible combinations C3311 ≈ 1.94 × 108 is not technically viable, therefore the overall number of samplings was restricted to a relatively small fraction of maximally 2 × 107 systematically generated populations subsequently used in the PLS modelling. The results of SMV procedure are graphically presented in the form of qcv2 vs. qtest2 estimator plots which illustrate in detail the impact of the training/test subset distributions on the final 3D and 4D-QSAR modelling. The relationship of the qcv2 evaluated with the LOO CV method for the training set against qtest2 LSO CV calculated during external prediction for several molecules, coupled with the density plot for PAMPA and SKIN activities is shown in Fig. 1a–h and 2a–h, respectively. Additionally, the histograms specifying a number of individual compounds appearing in the test set within the most densely populated regions, pointed out by black areas on the plots, are provided. Basically, the head of the comet-like profile for qcv2 vs. qtest2 distribution indicates the area of the higher modelling ability within training set accompanied by lower predictability within a test set (qcv2 ≥ 0.70, qtest2 ≥ 0.5). Obviously, these observations confirm our instinctive interpretation of qcv2/qtest2 fluctuation pattern where a too optimistic qcv2 value does not necessarily imply the predictive power of the corresponding models. Intuitively, the preferential choice of objects into the training set, which easily fit into the model, increase the likelihood of the inferior predictive performance for the remaining ones. As matter of fact, many models are characterized by pretty high qcv2 ≥ 0.8 values, but only a tiny fraction is attributed to the real predictive ability estimated by external validation as illustrated in Fig. 1a, c, e and g and 2a, c, e and g. Not surprisingly, in each case it was possible to specify training/test set distribution for which the corresponding pairs of qcv2/qtest2 outperform the Kennard–Stone subset selection as well as the division ranked according to the intestinal absorption and skin penetration activities. Interestingly, the collected answers to the most frequent training/test perturbations for PAMPA activity are assembled in the fairly narrow range of qcv2/qtest2 values (approximately two qcv2 units) contrarily to SKIN model distributions (approximately four qcv2 units), as presented in the density maps in Fig. 1b, d, f and h and 2b, d, f and h. The centers of the most densely populated clusters specify the area of both relatively high modelling as well as predictive model abilities. Accordingly, the histograms which illustrate the frequency of the individual compound selection in the test set as a function of the compound number while probing the most densely populated regions are provided as well. Basically, a relatively smooth compound distribution within the training/test subpopulations is revealed which confirms the representative character of the validation test group.
image file: c6ra15820j-f1.tif
Fig. 1 Relationship of qcv2 estimated with the LOO CV method for training set against qtest2 L-11-O CV (plots on the right side are illustrated in a form of density maps) for modelling PAMPA potency of cholic acid derivatives with CoMFA (a and b), CoMSA (c and d), 4D-QSAR (e and f) and SOM-4D-QSAR (g and h) approaches. Histograms specify the number of individual compounds appearing in the test set within the most densely populated regions (black area).

image file: c6ra15820j-f2.tif
Fig. 2 Relationship of qcv2 estimated with the LOO CV method for training set against qtest2 L-11-O CV (plots on the right side are illustrated in the form of density maps) for modelling SKIN potency of cholic acid derivatives with CoMFA (a and b), CoMSA (c and d), 4D-QSAR (e and f) and SOM-4D-QSAR (g and h) approaches. Histograms specify the number of individual compounds appearing in the test set within the most densely populated regions (black area).

The predictive power of the best models was validated using the Golbraikh–Tropsha criteria; the statistical measures should fulfil at least the following requirements for the test set: qtest2 > 0.5, R2 > 0.6, [(R2RO2)]/R2 < 0.1 and 0.85 ≤ k ≤ 1.15 where RO2 means correlation coefficient for the regression of observed vs. predicted without bias and k is a slope of the regression through the origin.5 The Golbraikh–Tropsha statistics were used for models with the highest possible qcv2 value while preserving the robustness and the predictive power (qtest2 ≥ 0.5), as presented in Table 5.

Table 5 Comparison of 3D and 4D-QSAR models for training/test series of cholic acid analogues for PAMPA (a) and SKIN (b) activitiesa
Entry Model qcv2(onc)c qtest2 k k R2 RO2 RO2
a (onc) – optimal number of components.b Model: training 22, test set 11 compounds.c Compound 19 was used as reference compound R.d Compound 15 was used as reference compound R.e Map size 30 × 30, md = 1.2.f Map size 30 × 30, md = 1.0.g Box 50 Å:50 Å:50 Å, cubic size 1 Å.h Map size 20 × 20, md = 2.0.i Map size 20 × 20, md = 0.6.
1a CoMFAb 0.91(8)c 0.54 0.92 0.99 0.66 0.87 0.99
2b 0.72(8)d 0.52 1.10 0.88 0.59 0.82 0.80
3a CoMSAb 0.95(10)c,e 0.51 0.95 0.96 0.54 0.98 0.99
4b 0.79(10)d,f 0.53 0.94 1.03 0.62 0.83 0.99
5a 4D-QSAR-Jqb 0.65(6)c,g 0.50 1.00 0.88 0.56 1.00 0.94
6b 0.51(6)d,g 0.53 1.07 0.91 0.56 0.80 0.83
7a SOM-4D-QSARqb 0.81(10)c,h 0.51 1.11 0.82 0.55 0.90 0.81
8b 0.42(8)d,i 0.51 1.06 0.92 0.53 0.91 0.89


4.3 Pseudo-consensus 3D pharmacophore mapping

An extensively large trial pool of highly correlated descriptors for chemical structure representation is generated in multidimensional QSAR studies; however the higher number of variables relative to objects the higher chance of co-linearity or overfitting.50 Consequently, a promising tendency to generate meaningful and predictive models with fewer descriptor terms is being observed to enhance the model interpretability. In order to exclude redundant or uninformative data the subsequent level of variable reduction has been employed as the pre-processing procedure. The automated IVE-PLS method was applied to prune the original set of generated descriptors as a filter to eliminate non-significant variables (probably noise data) and to identify structural descriptors having the highest individual weightings for the biological activity.45 The relative importance or weight of each informative descriptor is determined by the magnitude of the regression coefficient giving a clear 3D picture of the regions that should be modified to modulate the biological activity.23,24

Hence, 1000 randomly chosen 22/11 training/test samplings within the densely populated regions of qcv2 vs. qtest2 performance were specified to examine the behavior of these two statistics during the IVE-PLS variable elimination. The value of qcv2 depends on the number of variables eliminated and the model complexity as well. Regarding the number of objects, the maximum number of PLS components considered for the model generation was truncated to 7. The dependence of qcv2 for the training set vs. qtest2 parameter in the function of original variables eliminated was examined. Basically, the extraction of a column from the data matrix which is assigned with the lowest value of abs(mean(b)/std(b)) slightly improves the qcv2 performance. The backward column elimination is recurrently repeated as long as the optimal number of variables included within the model is achieved – the moment of qcv2 deterioration indicates the number of the relevant columns, i.e., crucial variables to be incorporated into the final PLS model. It was observed that in most cases of training/test samplings the model predictivity monitored by qtest2 remains stable for a considerable range of eliminated variables which corresponds well to Golbraikh and Tropsha report.5 Unlike in the standard procedure that displays such plot for a single training/test subset, an attempt was made to identify a common ensemble of variables which survive backward elimination and contribute importantly to activity simultaneously in all chosen models. The IVE-PLS method was employed to specify columns annotated with the highest stability for each of the randomly chosen models. The cumulative sum of common columns for 1000 models was specified and normalized to the range of [0–1]. Subsequently, the group of columns with the value above the pre-chosen cutoff of 0.5 was selected; however, the spatial pattern demonstrated in Fig. 3a and b and 4a and b was generated by further filtering of 80% of CoMSA and SOM-4D-QSAR descriptors with relatively small statistical significance, respectively. A visual inspection of the key spatial sites that increase/decrease the compound activity can provide direct knowledge about the pharmacophore features and indirect information about the interaction mode. The relative contribution of each variable is weighted by the magnitude and the sign of the corresponding regression coefficient, therefore colours code the sign of the descriptor impact on compound activity. Consequently, a simplified visual inspection of pharmacophore sites gives the clear picture of regions, which might be modified to modulate the compound potency. Colours code the sign of influence as illustrated in Fig. 3a and 4a, respectively. The bright spheres delineate the spatial pattern where atom or substituent is predicted to be positioned in order to increase the compound's activity, while the dark polyhedra denote the areas detrimental for the potency, probably due to steric hindrance or electrostatic factors. In particular, large regions with suggested favored contribution appear in the close proximity to the negatively charged oxygen atoms wrapping around the side chains for CoMSA and SOM-4D-QSAR approaches as well. Moreover, in Fig. 4a and b colours code four possible combinations of the sign of partial atom charges and the corresponding PLS weight terms. The observed arrangement of the spatial maps shown in Fig. 4a and b indicates that displayed regions relatively correspond to each other; however the application of the SOM-4D-QSAR paradigm revealed more abundant pattern. These findings demonstrate the significance of the area occupied by the negatively charged atoms of the side chains bonded to ring B and C which have negative regression coefficients, as marked by the dark blue spheres in Fig. 4a and b. The visual inspection of the pharmacophore site indicated by the IVE-PLS models disclosed the importance of the carboxylic groups where the partial negative charge is displayed on the oxygen atoms. The negative regression coefficients in these particular regions probably mean that some polar (electronegative) or hydrogen-bonding acceptor substituent might be effective in enhancing the compounds PAMPA potency. Correspondingly, the positive coefficients of CoMSA and SOM-4D-QSAR models denoted by red and light blue colours in Fig. 4a and b suggest the regions surrounding the methylene groups of side chains linked to ring B and C, which can be occupied by any type of atoms without a noticeable potency loss. The stochastic SMV protocol for the pharmacophore visualization based on the consensus 3D and 4D-QSAR modeling with satisfactory statistical characteristics provides the spatial map of chemical groups/atoms potentially relevant for increasing/decreasing the intestinal absorption (PAMPA) potency of the cholic acid analogues.


image file: c6ra15820j-f3.tif
Fig. 3 SOM-4D-QSAR IVE-PLS monitored for 1000 randomly chosen 22/11 training/test set samplings for charge descriptors and PAMPA activity models. Molecular plots show the spatial sectors of the greatest contribution to the absorption potency retrieved from 1000 random models. Colours code the sign of influence (a). Four possible combinations of mean charge and correlation coefficient are colour-coded for charge descriptors (b). Compound 19 was plotted as a reference molecule. 1 and 2 illustrate the direction of molecule rotation.

image file: c6ra15820j-f4.tif
Fig. 4 CoMSA IVE-PLS monitored for 1000 randomly chosen 22/11 training/test set samplings for charge descriptors and PAMPA activity models. Molecular plots show the spatial sectors of the greatest contribution into the absorption potency retrieved from 1000 random models. Colours code the sign of influence (a). Four possible combinations of mean charge and correlation coefficient are colour-coded for charge descriptors (b). Compound 19 was plotted as a reference molecule. 1 and 2 illustrate the direction of molecule rotation.

5. Conclusions

Basically, PAMPA models are superior to SKIN ones. Moreover, the quality of obtained models in term of qcv2 values is better for charge descriptors (q) compared to occupancy ones (o) suggesting the importance of the (co)factor specifying the electrostatic field distribution in the shape description. The comparison of the corresponding statistical characteristics generally confirms the previously observed trends in pairs of the qcv2/qtest2 values where 3D/4D neural methodology with a fuzzy molecular representation for various training/test subset distribution outperforms the standard cubic 3D/4D procedures e.g., 0.79/0.66 CoMSA vs. 0.75/0.64 CoMFA and 0.66/0.60 SOM-4D-QSARq vs. 0.49/0.61 4D-QSAR-Jq.

The predictive abilities of the grid and neural 3D/4D approaches have been examined for a large populations of models generated using the stochastic SMV procedure. A systematic model space inspection with splitting data collection into training/test subsets to monitor statistical performance in the effort for mapping the probabilistic pharmacophore geometry was conducted. The iterative variable elimination procedure IVE-PLS represents a filter for specifying descriptors having potentially the highest individual weighting for the observed potency of cholic acid analogues as drug absorption promoters. In the majority of training/test samplings the model predictivity is fairly stable in a considerable range of variable eliminated by the IVE-PLS methods. A simplified visual inspection of pharmacophore sites gives the clear picture of regions that might be modified to modulate the compound potency. In particular, large regions with suggested favored contribution appear in the close proximity to the negatively charged oxygen atoms wrapping around the side chains bonded to ring B and C for CoMSA and SOM-4D-QSAR approaches as well.

A pseudo-consensus 3D/4D-QSAR methodology was used to extract an average 3D pharmacophore hypothesis by exploration of the most densely populated training/test subpopulations to indicate the relevant factors contributing the drug absorption potency of cholic acid derivatives. The complementary 3D/4D-QSAR SOM-based paradigm coupled with IVE-PLS procedure can extend our understanding of the spectrum of membrane interactions and potency profiles of drug absorption promoters providing valuable clues for property-oriented synthesis as well.

Acknowledgements

The authors thank Professor Johann Gasteiger for facilitating access to the SONNIA programs. We would like to acknowledge the OpenEye and OpenBabel Scientific Software for the free academic license. Dr Andrzej Bak thanks Foundation for Polish Science for his individual grant. This study was also partially supported by the Slovak Research and Development Agency (Grant No. APVV-0516-12).

Notes and references

  1. J. Devillers, Methods Mol. Biol., 2013, 930, 3 CAS.
  2. H. C. Kolb, M. G. Finn and K. B. Sharpless, Angew. Chem., Int. Ed., 2001, 40, 2004 CrossRef CAS.
  3. A. Bak, M. Wyszomirski, T. Magdziarz, A. Smoliński and J. Polanski, Comb. Chem. High Throughput Screening, 2014, 17, 485 CrossRef CAS PubMed.
  4. J. Polanski, Curr. Med. Chem., 2009, 16, 3243 CrossRef CAS PubMed.
  5. A. Golbraikh and A. Tropsha, J. Mol. Graphics Modell., 2002, 20, 269 CrossRef CAS PubMed.
  6. A. Tropsha, Mol. Inf., 2010, 29, 476 CrossRef CAS PubMed.
  7. O. A. Santos-Filho, A. J. Hopfinger, A. Cherkasov and R. B. de Alencastro, Med. Chem., 2009, 5, 359 CrossRef CAS.
  8. O. A. Santos-Filho and A. J. Hopfinger, J. Chem. Inf. Model., 2006, 46, 345 CrossRef CAS PubMed.
  9. C. L. Senese and A. J. Hopfinger, J. Chem. Inf. Comput. Sci., 2003, 43, 1297 CrossRef CAS PubMed.
  10. E. Yanmaz, E. Saripinar, K. Sahin, N. Gecen and F. Copur, Bioorg. Med. Chem., 2011, 19, 2199 CrossRef CAS PubMed.
  11. J. Polanski, R. Gieleciak, T. Magdziarz and A. Bak, J. Chem. Inf. Comput. Sci., 2004, 44, 1423 CrossRef CAS PubMed.
  12. E. F. F. de Cunha, W. Sippl, T. de Castro Ramalho, O. A. Ceva Antunes, R. B. de Alencastro and M. G. Albuquerque, Eur. J. Med. Chem., 2009, 44, 4344 CrossRef PubMed.
  13. J. Polanski, R. Gieleciak and M. Wyszomirski, J. Chem. Inf. Comput. Sci., 2003, 43, 1754 CrossRef CAS PubMed.
  14. J. Polanski, A. Bak, R. Gieleciak and T. Magdziarz, J. Chem. Inf. Model., 2006, 46, 2310 CrossRef CAS PubMed.
  15. J. Polanski, R. Gieleciak and M. Wyszomirski, Dyes Pigm., 2004, 62, 61 CrossRef CAS.
  16. C. H. Andrade, K. F. M. Pasqualoto, E. I. Ferreira and A. J. Hopfinger, Molecules, 2010, 15, 3281 CrossRef CAS PubMed.
  17. P. Thipnate, J. Liu, S. Hannongbua and A. J. Hopfinger, J. Chem. Inf. Model., 2009, 49, 2312 CrossRef CAS PubMed.
  18. J. Polanski and A. Bak, J. Chem. Inf. Comput. Sci., 2003, 43, 2081 CrossRef CAS PubMed.
  19. C. H. Andrade, K. F. M. Pasqualoto, E. I. Ferreira and A. J. Hopfinger, J. Chem. Inf. Model., 2009, 49, 1070 CrossRef CAS PubMed.
  20. N. C. Romeiro, M. G. Albuquerque, R. B. de Alencastro, M. Ravi and A. J. Hopfinger, J. Comput.-Aided Mol. Des., 2005, 19, 385 CrossRef CAS PubMed.
  21. A. J. Hopfinger, S. Wang, J. S. Tokarski, B. Jin, M. Albuquerque, P. Madhav and C. Duraiswami, J. Am. Chem. Soc., 1997, 119, 10509 CrossRef CAS.
  22. O. A. Santos-Filho and A. J. Hopfinger, J. Comput.-Aided Mol. Des., 2001, 15, 1 CrossRef CAS PubMed.
  23. A. Bak and J. Polanski, J. Chem. Inf. Model., 2007, 47, 1469 CrossRef CAS PubMed.
  24. J. Polanski, A. Bak, R. Gieleciak and T. Magdziarz, Molecules, 2009, 9, 1148 CrossRef.
  25. E. F. F. da Cunha, M. G. Albuquerque, O. A. C. Antunes and R. B. de Alencastro, QSAR Comb. Sci., 2005, 24, 240 Search PubMed.
  26. R. Gieleciak and J. Polanski, J. Chem. Inf. Model., 2007, 47, 547 CrossRef CAS PubMed.
  27. M. Milewski and A. L. Stinchcomb, Mol. Pharm., 2012, 9, 2111 CrossRef CAS PubMed.
  28. T. Gonec, J. Kos, I. Zadrazilova, M. Pesko, S. Keltosova, J. Tengler, P. Bobal, P. Kollar, A. Cizek, K. Kralova and J. Jampilek, Bioorg. Med. Chem., 2013, 21, 6531 CrossRef CAS PubMed.
  29. T. Gonec, J. Kos, E. Nevin, R. Govender, M. Pesko, J. Tengler, I. Kushkevych, V. Stastna, M. Oravec, P. Kollar, J. O'Mahony, K. Kralova, A. Coffey and J. Jampilek, Molecules, 2014, 19, 10386 CrossRef PubMed.
  30. P. Sharma, H. P. S. Chawla and R. Panchagnula, Drugs Future, 1999, 24, 1221 CrossRef CAS.
  31. J. Jampilek and K. Brychtova, Med. Res. Rev., 2012, 32, 907 CrossRef CAS PubMed.
  32. L. Mrozek, L. Dvorakova, Z. Mandelova, L. Rarova, A. Rezacova, L. Placek, R. Opatrilová, J. Dohnal, O. Paleta, V. Kral, P. Drasar and J. Jampilek, Steroids, 2011, 76, 1082 CrossRef CAS PubMed.
  33. L. Mrozek, L. Coufalova, L. Rarova, L. Placek, R. Opatrilova, J. Dohnal, K. Kralova, O. Paleta, V. Kral, P. Drasar and J. Jampilek, Steroids, 2013, 78, 832 CrossRef CAS PubMed.
  34. L. Coufalova, L. Mrozek, L. Rarova, L. Placek, R. Opatrilova, J. Dohnal, K. Kralova, O. Paleta, V. Kral, P. Drasar and J. Jampilek, Steroids, 2013, 78, 435 CrossRef CAS PubMed.
  35. L. Peng, F. Mo and Q. Zhang, J. Org. Chem., 2015, 80, 1221 CrossRef CAS PubMed.
  36. J. Polanski, R. Gieleciak and A. Bak, Comb. Chem. High Throughput Screening, 2004, 7, 793 CrossRef CAS PubMed.
  37. S. Funar-Timofei and G. Schuurmann, J. Chem. Inf. Comput. Sci., 2002, 42, 788 CrossRef CAS PubMed.
  38. R. D. Cramer III, D. E. Patterson and J. D. Bunce, J. Am. Chem. Soc., 1988, 110, 5959 CrossRef PubMed.
  39. J. Polanski, Adv. Drug Delivery Rev., 2003, 55, 1149 CrossRef CAS PubMed.
  40. J. Polanski and B. Walczak, Comput. Chem., 2000, 24, 615 CrossRef CAS PubMed.
  41. T. Magdziarz, B. Lozowicka, R. Gieleciak, A. Bak, J. Polanski and Z. Chilmonczyk, Bioorg. Med. Chem., 2006, 14, 1630 CrossRef CAS PubMed.
  42. S. S. da Rocha Pita, M. G. Albuquerque, C. R. Rodrigues, H. C. Castro and A. J. Hopfinger, Chem. Biol. Drug Des., 2012, 79, 740 Search PubMed.
  43. J. P. Martins, E. G. Barbosa, K. F. Pasqualoto and M. M. Ferreira, J. Chem. Inf. Model., 2009, 49, 1428 CrossRef CAS PubMed.
  44. D. T. Stanton, Curr. Comput.-Aided Drug Des., 2012, 8, 107 CrossRef CAS PubMed.
  45. V. Centner, D. L. Massart, O. E. de Noord, S. de Jong, B. M. V. Vandeginste and C. Sterna, Anal. Chem., 1996, 68, 3851 CrossRef CAS PubMed.
  46. S. Michael, M. Thole, R. Dillmann, A. Fahr, J. Drewe and G. Fricker, Eur. J. Pharm. Sci., 2000, 10, 133 CrossRef CAS PubMed.
  47. M. Mikov, J. P. Fawcett, K. Kuhajda and S. Kevresan, Eur. J. Drug Metab. Pharmacokinet., 2006, 31, 237 CrossRef CAS PubMed.
  48. V. Carelli, G. Di Colo, E. Nannipieri and M. F. Serafini, Int. J. Pharm., 1993, 89, 81 CrossRef CAS.
  49. R. W. Kennard and L. A. Stone, Technometrics, 1969, 11, 137 CrossRef.
  50. H. P. Xie, J. H. Jiang, H. Cui, G. L. Shen and R. Q. Yu, Comput. Chem., 2002, 26, 591 CrossRef CAS PubMed.

Footnote

Electronic supplementary information (ESI) available. See DOI: 10.1039/c6ra15820j

This journal is © The Royal Society of Chemistry 2016
Click here to see how this site uses Cookies. View our privacy policy here.