A. Bak*a,
V. Kozikb,
A. Smolinskic and
J. Jampilekd
aDepartment of Organic Chemistry, Institute of Chemistry, University of Silesia, Katowice, Poland. E-mail: Andrzej.Bak@us.edu.pl
bDepartment of Synthesis Chemistry, Institute of Chemistry, University of Silesia, Katowice, Poland
cDepartment of Energy Saving and Air Protection, Central Mining Institute, Katowice, Poland
dDepartment of Pharmaceutical Chemistry, Faculty of Pharmacy, Comenius University, Bratislava, Slovakia
First published on 5th August 2016
In the current study a hybrid approach that combines 3D and 4D-QSAR methods based on grid and neural (SOM) paradigms with automated variable elimination IVE-PLS procedure was examined to identify the pharmacophore pattern for cholic acid derivatives as potential drug absorption promoters. In particular, the outcome of multidimensional structure–activity modelling of the transdermal penetration effect (SKIN) and intestinal absorption enhancement (PAMPA) using the classical CoMFA and Hopfinger's cube formalisms has been compared with the neural CoMSA and SOM-4D-QSAR methodology for a set of cholic derivatives. The comparison of the corresponding statistic characteristics generally confirms the previously observed trends in pairs of qcv2/qtest2 values where 3D/4D SOM-based protocols with a fuzzy molecular representation for various training/test subset distributions outperforms the standard cubic 3D/4D procedures. A systematic model space inspection with splitting data collection into training/test subsets to monitor statistical performance in the effort for mapping of the probabilistic pharmacophore geometry was conducted using the stochastic SMV procedure. The iterative variable elimination procedure (IVE-PLS) represents a filter for specifying descriptors having potentially the highest individual weightings for the observed potency of cholic acid analogues as drug absorption promoters. A simplified visual inspection of pharmacophore sites gives the clear picture of regions that might be modified to modulate the compound potency. A pseudo-consensus 3D/4D-QSAR methodology was used to extract an average 3D pharmacophore hypothesis by exploration of the most densely populated training/test subpopulations to indicate the relevant factors contributing to the drug absorption potency of cholic acid derivatives.
A number of 3D-QSAR procedures have been implemented; however the comparative molecular field analysis (CoMFA) diffused quickly into medicinal and computational chemistry becoming a cornerstone for computer-aided molecular design. It is a long-established in silico technique basically employed as a tool for optimizing steric and/or electrostatic ligand patterns.11 The CoMFA approach characterizes the molecular features of superimposed molecules by the spatial distribution of non-covalent areas evaluated over lattice points which are correlated with variations of the biological response using partial least squares (PLS) method.12 The devised functions translate shape information into the form of spatially uniform maps that enable the prediction of the biological potency. Masking explicit shape information by the regularity of the cubic grid lattice and/or molecular surfaces as a conventional imitation of the molecular boundaries can be helpful in explaining pharmacological effects. A variety of methods evolved from this concept and a number of alternative CoMFA-like protocols have emerged, e.g., comparative molecular surface analysis (CoMSA) that implemented improvements in the molecular description, alignment rules and, subsequently, the modelling performance (predictive quality) as well.13 The fuzzification of the molecular representation which improves the compound superimposition seems to be of special interest due to a prospectively better description of the binding affinity to the putative receptor structure. Hence, CoMSA substitutes potential values specified at single points with mean potential values calculated for surface sectors.14 The applied competitive neural network (SOM) estimates the (dis)similarity of distinct molecules (template and counter-template) by comparing (unnecessarily identical) slices of molecular surfaces using the memorizing ability of neurons to assemble the patterns (vectors) located near the template attractor. The efficiency of comparative mapping has been proven in pharmacophore modelling or exploring the molecular diversity due to the robust neuron architecture enables to diminish the impact of the molecular superimposition mode; however the indeterministic nature of neural net imposes some restrains.15
The nature of the guest functional groups and the corresponding spatial distribution of properties in the binding site can be partially elucidated by a single 3D entity adopted by a ligand namely ‘bioactive’ conformation.16 Consequently, an obvious question arises about the preferable ligand geometry with the greatest complementarity/affinity as a ‘negative’ of an active target site when the empirical data are not available. Typically, the postulated initial ligand geometry is constructed based on the ‘sophisticated guess’ (not necessarily energy-minimized structure) mirroring the pharmacophore hypothesis which can be an erroneous prerequisite. In consequence, the higher level of model abstraction allowing for the examination of the multiple molecular conformation, orientation and protonation state representation has been proposed in the domain of rational drug design in the form of an additional dimension – the fourth one.17 In the broadest sense, 4D-QSAR methodology has evolved from the molecular shape analysis (MSA) by explicit enhancement of 3D-QSAR approaches, where the substitution of the single-conformer concept with cube-like structures of different resolution produces a fuzzy pattern of molecular objects.18 4D-QSAR analysis has proven to be reliable in the construction of quantitative 3D pharmacophore models in a function of molecular conformation, alignment and fragmentation (atom type).19 The resultant pharmacophore site is generated by extensive exploration of both conformational and alignment degrees of freedom to reduce the bias by selecting a (bio)active ligand conformer and binding mode using the conformational-search protocols e.g. molecular dynamic simulations (MDs).20 The population of conformational states for each analogue is sampled to generate molecular trajectory profiles. The 4D-QSAR formalism successfully incorporates some CoMFA characteristic employing the spatial grid, where the grid cell size is viewed as a ‘methodology parameter’ to produce a molecular shape spectrum (MSS) according to the trial alignment rule.21 The occupancy frequency of the unit cubes by individual atoms or even atom groups composing each molecule can be optionally augmented by the direct inclusion of the target data.22 Moreover, the neural version of Hopfinger's cube formalism, namely SOM-4D-QSAR, employing the Kohonen self-organizing maps has been applied to produce a fuzzy 4D-QSAR-like representation of the conformational space as an appealing alternative which performs comparably to its grid counterpart.23 In the proposed methodology a unit cube resolution used in the classical approach has been substituted by a sphere specified in space via single SOM neuron.24
An enormous number of trial conformers is generated during the MDs simulations while generally only a few spatial moieties contribute into the target response. Furthermore, the composite population of input descriptors assigned as ‘original databank’ generates dimensionality issue and diminishes the performance of QSAR modelling; therefore a hybrid approach that combines automated variable evaluation and reduction has been introduced to select important variables having the highest individual weightings to the observed bioactivity.25 In order to prune data noise the uninformative variables which do not contribute relevantly to the model should be excluded in advance, hence the iterative variable elimination procedure based on the partial least square procedure (IVE-PLS) represents a filter to recognize non-significant cell occupancies.26
A number of modern drugs are not available to patients due to their poor aqueous solubility and permeability.27 Generally, modification/optimization of poor permeability through membranes can be achieved by selection of appropriate excipients to function as transporters (surfactants or pharmaceutical complexing agents, permeability enhancers) being components of a dosage form. These excipients that increase absorption of drugs to blood circulation are known as intestinal absorption promoters in oral drug formulations and transdermal penetration enhancers in transdermal therapeutic systems.28 Numerous compounds of different chemical structures were evaluated/applied as absorption promoters.29–31
Cholic acid is one of the most important human bile acids. Bile acid derivatives/analogues are an important class of compounds with a range of pharmacological activities. Bile acids could be easily modified by derivatisation of the functional groups on the steroid nucleus.32 Nontoxic bile acid/salt derivatives (as amphiphilic compounds) are widely used in drug formulations as excipients – as a type of absorption promoters and thus they can influence gastrointestinal solubility, absorption and chemical/enzymatic stability of drugs.33 Cholic acid derivatives were studied also as transdermal penetration enhancers.34 The reason for their activity could be their specific solvation and self-assembly features.35
The principal objective of the current investigation was two-fold. First of all, it is of interest to compare the impact of the coding molecular systems on the efficiency of structure–activity performance using 3D (CoMFA and CoMSA), and 4D (standard and neural formalism) methods on the ensemble of drug absorption promoters, respectively. Irrespective of the variations in the manner the cholic acid derivatives were encoded both approaches performed comparably; however SOM-4D-QSAR yielded better results for the corresponding training/test sets. It seems that the robust neuron architecture decreases the influence of the compound superimposition mode by a fuzzy picture of molecular objects. The relationship between chemical structure of cholic acid derivatives as potential drug absorption modifiers and their enhancement effects are discussed. Additionally, we concentrated on systematic model space inspection with splitting data collection into training/test subsets to monitor statistical estimators performance in the effort for mapping of the probabilistic pharmacophore geometry using the stochastic model validation (SMV) approach.36 The automated variable reduction with the IVE-PLS procedure represents a sieve for detecting only those descriptors that have prescribed the greatest individual weighting for the observed cholic acids analogue activity. The ‘pseudo-consensus’ 4D-QSAR methodology was used to extract an ‘average’ 3D-pharmacophore by exploring various data subpopulations and which embodies the quantity for quality argument to indicate the relevant contributing factors of the cholic acid absorption activity.
![]() | (1) |
The weights of the winning neuron and the neighboring neurons are then modified to resemble and subsequently attract similar input vectors. The whole procedure is repeated while the next input vector is being presented to the network.39 In fact, self-organizing neural mapping is regarded as a nonlinear projection tool, which reduces the dimensionality of the input object, e.g. converts 3D objects to 2D, while preserving the topological relationships between the input and output data. Hence, the trained network can be applied for the projections of the specified molecular property prescribed to the input vector with the generation of the planar colour-coded clustering pattern, namely a feature map. Consequently, the SOM algorithm was used to construct a two-dimensional topographic map obtaining the signals from the points sampled randomly at the molecular surface such as an electrostatic potential map.40 In such application each 3-dimensional input vector consisting of x, y, and z coordinates is compared to a 3-element weight vector describing each neuron to find the closest neighbor and then project a signal into this particular neuron. The knowledge concerning the shape of the certain molecular surface (template) encoded in the weights of the trained Kohonen network can be used for processing the signals coming from the surface of other molecule(s) (counter-template) providing a series of comparative SOM maps to compare/contrast the superimposed molecular geometry. A variety of SOM applications for the classification, visualization and compression of the structural data has been described in chemistry, in particular for two-dimensional mapping of the electrostatic potential on the three-dimensional molecular surfaces or partial atomic charges for the atomic molecular representation. Moreover, we implemented a series of QSAR methods for pharmacophore mapping based on such comparative Kohonen strategy including CoMSA and SOM-4D-QSAR.41
![]() | (2) |
![]() | (3) |
![]() | (4) |
Each alignment produces a unique grid cell occupancy/charge distribution for a given molecular trajectory. The grid cells are unfolded into vectors and subsequently the array composed of vectors describing all molecules is generated and used to estimate the structure–activity relationship with PLS method coupled with variable selection/elimination procedures.
Y = Xb + e | (5) |
![]() | (6) |
A cross-validated leave-one-out qcv2 value for the estimation of the model performance is computed using the following formula:
![]() | (7) |
![]() | (8) |
The quality of external predictions was measured by the standard deviation of error of prediction (SDEP) and qtest2 defined as:
![]() | (9) |
![]() | (10) |
The steric and electrostatic field energies were calculated using sp3 hybridized carbon probe atom with a charge of +1 and 0 and hydrogen as a probe atom with a charge of +1, respectively. The CoMFA grid spacing was 2.0 Å for all the Cartesian dimensions within the defined region of 3D lattice, which extended beyond the van der Waals envelopes of all molecules by at least 4.0 Å. The non-covalent interaction fields were determined at each intersection on regularly spaced grid. For each molecule the energies with a total of 10560 grid points were calculated with 2 Å spacing in a 20 × 22 × 24 lattice. To reduce data noise, all columns with energy variance less than 2.0 kcal mol−1 were discarded by setting the sigma parameter to 2.0 kcal mol−1. Both steric/electrostatic energies with a value greater than 30.0 kcal mol−1 were truncated to a tentative value of 30.0 (default cutoff). The variations in CoMFA interaction fields (independent variables) were correlated with the changes in activity (dependent variable) with standard internal and external validation techniques to specify the statistical index of the model predictive power using the SAMPLS method.
The SONNIA software was applied in CoMSA analysis to simulate 20 × 20 or 30 × 30 SOMs with winning distance (md) varied in the range of 0.2–2.0. The SOM network was fed with the Cartesian coordinates of the molecular surfaces for superimposed molecules to form a two-dimensional map of electrostatic potential – the most active analog in each series of compounds (19) was used to form the template molecule. The output maps were subsequently transformed to a 400- or 900-element vector which was processed by PLS method implemented in MATLAB programming environment, accordingly to CoMFA analysis.
Energy-minimized molecules (MAXMIN2 module implemented in Sybyl) were used as the initial structures in the molecular dynamic simulations (MDs) with the standard Tripos force field. Each 3D structure is a starting point in generating the conformational ensemble profile (CEP). The CEP is created from a MDs run of 100 ps generated at intervals of 0.001 ps time step. The temperature for the MDs was normally set to 300 K. The atomic coordinates of each conformation and its total energy were recorded every 0.1 ps. One thousand conformations were sampled systematically (one out of hundred) for each analogue in CEP based on the generated 100000 trajectory states. Partial atomic charges were estimated with the semiempirical AM1 Hamiltonian implemented in HyperChem 6.03 software. Both frequency and charge-related GCOD descriptors were determined for a 0.5, 1.0 and 2.0 Å resolution of the grid cell lattice with the most active compound 19 as a reference, respectively. The best QSAR models were selected on the basis of the statistically relevant parameters obtained by means of the PLS procedure. Moreover, the resultant MDs trajectories were processed by neural network with the arbitrary chosen molecule 19 as a template to train a set of SOM maps. The SONNIA package was employed in SOM-4D-QSAR simulations to produce 20 × 20 or 30 × 30 SOMs with winning distance (md) varying in the range of 0.2–2.0, respectively. Correspondingly to CoMSA, the output arrays are reshaped to a 400 or 900 element vector processed by the PLS procedure.
One objective of this study was to systematically investigate the 3D and 4D-QSAR performance in a model of transdermal penetration effect and intestinal absorption enhancement observed for the set of cholic acid derivatives. It should be emphasized that we did not concentrate on details of each modelling method, but more on the philosophy of molecular object description; therefore the impact of coding system on the efficiency of 3D/4D-QSAR modelling was investigated. Furthermore, we examined the pharmacophore properties of a target series using coupled neural network and PLS method with the variable elimination IVE procedure. We compared the results of modelling using the standard 3D/4D-QSAR methodologies and their neural counterparts, respectively. Table 2 shows the qcv2 performance for the entire cholic acid dataset in training subset range from 0.62 to 0.81 for PAMPA models vs. 0.46 to 0.56 for SKIN modelling, depending on the type of descriptors used. It is noticeable that 3D methods (CoMFA/CoMSA) perform comparably (qcv2 > 0.8 for PAMPA potency), whereas 4D approaches produce slightly inferior results. Moreover, the quality of obtained models in term of qcv2 values is better for charge descriptors (q) compared to occupancy ones (o) suggesting the importance of the (co)factor specifying the electrostatic field distribution in the shape description. Interestingly, an extension of the descriptor pools with molecular lipophilicity evaluated with the calculated logP value does not have a noticeable impact on the modelling outcomes. The exclusive reliance on the training set is inadvisable to determine the robustness and the predictive ability of models; therefore apart from the internal validation with the cross-validation procedure the external model validation with splitting the molecule collection into a set of training/test subsets has been performed as well. A low qcv2 value for the training set is a sufficient indicator of the low predictive ability of a model. On the contrary, a good model fit alone is a necessary, but not indicative parameter characterizing model robustness and reliability. Consequently, some external predictions were evaluated using SDEP and qtest2 statistics.5 To estimate the model's predictive power the original dataset was divided arbitrarily into training/test subsets in 2
:
1 ratio (22/11) ranked according to the values of intestinal absorption (PAMPA) and skin penetration (SKIN) activity to create representative molecular subsets as follows: (1, 2, 4, 6, 7, 8, 11, 12, 13, 16, 17, 18, 20, 21, 22, 25, 26, 27, 30, 31, 32, 33)/(3, 5, 9, 10, 14, 15, 19, 23, 24, 28, 29) and (1, 2, 4, 5, 6, 7, 8, 9, 10, 13, 14, 16, 18, 20, 21, 23, 24, 28, 30, 31, 32, 33)/(3, 11, 12, 15, 17, 19, 22, 25, 26, 27, 29), respectively. Table 3 illustrates the comparison of 3D vs. 4D-QSAR performance in training and test compound sets. It is worth noting that parameters of PAMPA models are generally superior to SKIN ones – the same tendency was observed for the entire set of compounds. Basically, in all cases the qcv2/qtest2 outcome indicates a comparable efficiency in modelling drug enhancement activity; however still 3D methods produce a slightly better findings (0.73/0.87 CoMSA and 0.69/0.81 CoMFA for PAMPA activity). Generally, the predictive ability for the test set (qtest2) is more or less at the same level as the qcv2 value, especially for PAMPA models. On the other hand, the exclusion of 11 compounds from the training set has small, but noticeable impact on the qcv2 characteristics – for all approaches the decrease amounts to approximately 0.1, respectively. Additionally, the Kennard–Stone algorithm was employed on dependent variables to divide representatively the data collection into training/test subgroups.49 The impact of the training/test set distribution with Kennard–Stone procedure on the statistical characteristics of the 3D and 4D approaches is provided in Table 4. The comparison of the corresponding models generally confirms the previously observed trends in pairs of the qcv2/qtest2 values where 3D/4D neural methodology with a fuzzy molecular representation for various training/test subset distribution outperforms the standard 3D/4D procedures e.g., 0.79/0.66 CoMSA vs. 0.75/0.64 CoMFA and 0.66/0.60 SOM-4D-QSARq vs. 0.49/0.61 4D-QSAR-Jq. Interestingly, the charge descriptors show better effectiveness in resolving the cholic acid activity which suggests that electrostatic effects are equally important as shape factors in modelling drug absorption promoters' activity. These findings promote the electrostatic component as potentially significant in enhancing drug absorption potential; however, the impact of the steric properties might be explained by the fact that the molecular shape is a crucial factor determining the distribution of atomic partial charges.
L.p. | Model | PAMPAb | SKINc | ||
---|---|---|---|---|---|
qcv2(onc) | s | qcv2(onc) | s | ||
a (onc) – optimal number of components.b Model: training set (1–33).c Model: training set (1–33).d Compound 19 was used as reference compound R.e Compound 15 was used as reference compound R.f Map size 30 × 30, md = 1.2.g Map size 30 × 30, md = 1.0.h Box 50 Å:50 Å:50 Å, cubic size 1 Å.i Map size 20 × 20, md = 2.0.j Map size 20 × 20, md = 0.2. | |||||
1 | CoMFA | 0.81(6)d | 0.40 | 0.55(5)e | 0.34 |
2 | CoMSA | 0.81(7)d,f | 0.42 | 0.56(6)e,g | 0.33 |
3 | 4D-QSAR-Jq | 0.62(7)d,h | 0.59 | 0.46(6)e,h | 0.37 |
4 | SOM-4D-QSARq | 0.71(5)d,i | 0.50 | 0.50(10)d,j | 0.38 |
L.p. | Model | PAMPAb | SKINc | ||||||
---|---|---|---|---|---|---|---|---|---|
qcv2(onc) | s | SDEP | qtest2 | qcv2(onc) | s | SDEP | qtest2 | ||
a (onc) – optimal number of components.b Model: training set (1, 2, 4, 6, 7, 8, 11, 12, 13, 16, 17, 18, 20, 21, 22, 25, 26, 27, 30, 31, 32, 33) and test set (3, 5, 9, 10, 14, 15, 19, 23, 24, 28, 29).c Model: training set (1, 2, 4, 5, 6, 7, 8, 9, 10, 13, 14, 16, 18, 20, 21, 23, 24, 28, 30, 31, 32, 33) and test set (3, 11, 12, 15, 17, 19, 22, 25, 26, 27, 29).d Compound 19 was used as reference compound R.e Compound 15 was used as reference compound R.f Map size 30 × 30, md = 1.2.g Map size 30 × 30, md = 1.0.h Box 50 Å:50 Å:50 Å, cubic size 1 Å.i Map size 20 × 20, md = 2.0.j Map size 20 × 20, md = 0.6. | |||||||||
1 | CoMFA | 0.69(10)d | 0.67 | 0.37 | 0.81 | 0.50(4)e | 0.36 | 0.35 | 0.33 |
2 | CoMSA | 0.73(6)d,f | 0.53 | 0.30 | 0.87 | 0.53(4)e,g | 0.35 | 0.32 | 0.41 |
3 | 4D-QSAR-Jq | 0.53(10)d,h | 0.82 | 0.64 | 0.53 | 0.31(10)e,h | 0.52 | 0.32 | 0.50 |
4 | SOM-4D-QSARq | 0.67(5)d,i | 0.57 | 0.53 | 0.66 | 0.36(2)d,j | 0.38 | 0.43 | 0.30 |
L.p. | Model | PAMPAb | SKINc | ||||||
---|---|---|---|---|---|---|---|---|---|
qcv2(onc) | s | SDEP | qtest2 | qcv2(onc) | s | SDEP | qtest2 | ||
a (onc) – optimal number of components.b Model: training set (1, 2, 4, 5, 6, 7, 9, 10, 12, 13, 14, 15, 16, 17, 18, 19, 21, 25, 26, 27, 30, 31) and test set (3, 8, 11, 20, 22, 23, 24, 28, 29, 32, 33).c Model: training set (2, 4, 5, 6, 9, 10, 11, 12, 13, 15, 16, 18, 19, 20, 22, 25, 26, 28, 29, 31, 32, 33) and test set (1, 3, 7, 8, 14, 17, 21, 23, 24, 27, 30).d Compound 19 was used as reference compound R.e Compound 15 was used as reference compound R.f Map size 30 × 30, md = 1.2.g Map size 30 × 30, md = 1.0.h Box 50 Å:50 Å:50 Å, cubic size 1 Å.i Map size 20 × 20, md = 2.0.j Map size 20 × 20, md = 0.2. | |||||||||
1 | CoMFA | 0.75(2)d | 0.47 | 0.48 | 0.64 | 0.47(9)e | 0.47 | 0.24 | 0.27 |
2 | CoMSA | 0.79(9)d,f | 0.53 | 0.45 | 0.66 | 0.50(5)e,g | 0.40 | 0.26 | 0.07 |
3 | 4D-QSAR-Jq | 0.49(8)d,h | 0.80 | 0.53 | 0.61 | 0.32(10)e,h | 0.56 | 0.30 | 0.15 |
4 | SOM-4D-QSARq | 0.66(10)d,i | 0.70 | 0.55 | 0.60 | 0.33(10)d,j | 0.56 | 0.23 | 0.34 |
The predictive power of the best models was validated using the Golbraikh–Tropsha criteria; the statistical measures should fulfil at least the following requirements for the test set: qtest2 > 0.5, R2 > 0.6, [(R2 − RO2)]/R2 < 0.1 and 0.85 ≤ k ≤ 1.15 where RO2 means correlation coefficient for the regression of observed vs. predicted without bias and k is a slope of the regression through the origin.5 The Golbraikh–Tropsha statistics were used for models with the highest possible qcv2 value while preserving the robustness and the predictive power (qtest2 ≥ 0.5), as presented in Table 5.
Entry | Model | qcv2(onc)c | qtest2 | k | k′ | R2 | RO2 | RO′2 |
---|---|---|---|---|---|---|---|---|
a (onc) – optimal number of components.b Model: training 22, test set 11 compounds.c Compound 19 was used as reference compound R.d Compound 15 was used as reference compound R.e Map size 30 × 30, md = 1.2.f Map size 30 × 30, md = 1.0.g Box 50 Å:50 Å:50 Å, cubic size 1 Å.h Map size 20 × 20, md = 2.0.i Map size 20 × 20, md = 0.6. | ||||||||
1a | CoMFAb | 0.91(8)c | 0.54 | 0.92 | 0.99 | 0.66 | 0.87 | 0.99 |
2b | 0.72(8)d | 0.52 | 1.10 | 0.88 | 0.59 | 0.82 | 0.80 | |
3a | CoMSAb | 0.95(10)c,e | 0.51 | 0.95 | 0.96 | 0.54 | 0.98 | 0.99 |
4b | 0.79(10)d,f | 0.53 | 0.94 | 1.03 | 0.62 | 0.83 | 0.99 | |
5a | 4D-QSAR-Jqb | 0.65(6)c,g | 0.50 | 1.00 | 0.88 | 0.56 | 1.00 | 0.94 |
6b | 0.51(6)d,g | 0.53 | 1.07 | 0.91 | 0.56 | 0.80 | 0.83 | |
7a | SOM-4D-QSARqb | 0.81(10)c,h | 0.51 | 1.11 | 0.82 | 0.55 | 0.90 | 0.81 |
8b | 0.42(8)d,i | 0.51 | 1.06 | 0.92 | 0.53 | 0.91 | 0.89 |
Hence, 1000 randomly chosen 22/11 training/test samplings within the densely populated regions of qcv2 vs. qtest2 performance were specified to examine the behavior of these two statistics during the IVE-PLS variable elimination. The value of qcv2 depends on the number of variables eliminated and the model complexity as well. Regarding the number of objects, the maximum number of PLS components considered for the model generation was truncated to 7. The dependence of qcv2 for the training set vs. qtest2 parameter in the function of original variables eliminated was examined. Basically, the extraction of a column from the data matrix which is assigned with the lowest value of abs(mean(b)/std(b)) slightly improves the qcv2 performance. The backward column elimination is recurrently repeated as long as the optimal number of variables included within the model is achieved – the moment of qcv2 deterioration indicates the number of the relevant columns, i.e., crucial variables to be incorporated into the final PLS model. It was observed that in most cases of training/test samplings the model predictivity monitored by qtest2 remains stable for a considerable range of eliminated variables which corresponds well to Golbraikh and Tropsha report.5 Unlike in the standard procedure that displays such plot for a single training/test subset, an attempt was made to identify a common ensemble of variables which survive backward elimination and contribute importantly to activity simultaneously in all chosen models. The IVE-PLS method was employed to specify columns annotated with the highest stability for each of the randomly chosen models. The cumulative sum of common columns for 1000 models was specified and normalized to the range of [0–1]. Subsequently, the group of columns with the value above the pre-chosen cutoff of 0.5 was selected; however, the spatial pattern demonstrated in Fig. 3a and b and 4a and b was generated by further filtering of 80% of CoMSA and SOM-4D-QSAR descriptors with relatively small statistical significance, respectively. A visual inspection of the key spatial sites that increase/decrease the compound activity can provide direct knowledge about the pharmacophore features and indirect information about the interaction mode. The relative contribution of each variable is weighted by the magnitude and the sign of the corresponding regression coefficient, therefore colours code the sign of the descriptor impact on compound activity. Consequently, a simplified visual inspection of pharmacophore sites gives the clear picture of regions, which might be modified to modulate the compound potency. Colours code the sign of influence as illustrated in Fig. 3a and 4a, respectively. The bright spheres delineate the spatial pattern where atom or substituent is predicted to be positioned in order to increase the compound's activity, while the dark polyhedra denote the areas detrimental for the potency, probably due to steric hindrance or electrostatic factors. In particular, large regions with suggested favored contribution appear in the close proximity to the negatively charged oxygen atoms wrapping around the side chains for CoMSA and SOM-4D-QSAR approaches as well. Moreover, in Fig. 4a and b colours code four possible combinations of the sign of partial atom charges and the corresponding PLS weight terms. The observed arrangement of the spatial maps shown in Fig. 4a and b indicates that displayed regions relatively correspond to each other; however the application of the SOM-4D-QSAR paradigm revealed more abundant pattern. These findings demonstrate the significance of the area occupied by the negatively charged atoms of the side chains bonded to ring B and C which have negative regression coefficients, as marked by the dark blue spheres in Fig. 4a and b. The visual inspection of the pharmacophore site indicated by the IVE-PLS models disclosed the importance of the carboxylic groups where the partial negative charge is displayed on the oxygen atoms. The negative regression coefficients in these particular regions probably mean that some polar (electronegative) or hydrogen-bonding acceptor substituent might be effective in enhancing the compounds PAMPA potency. Correspondingly, the positive coefficients of CoMSA and SOM-4D-QSAR models denoted by red and light blue colours in Fig. 4a and b suggest the regions surrounding the methylene groups of side chains linked to ring B and C, which can be occupied by any type of atoms without a noticeable potency loss. The stochastic SMV protocol for the pharmacophore visualization based on the consensus 3D and 4D-QSAR modeling with satisfactory statistical characteristics provides the spatial map of chemical groups/atoms potentially relevant for increasing/decreasing the intestinal absorption (PAMPA) potency of the cholic acid analogues.
The predictive abilities of the grid and neural 3D/4D approaches have been examined for a large populations of models generated using the stochastic SMV procedure. A systematic model space inspection with splitting data collection into training/test subsets to monitor statistical performance in the effort for mapping the probabilistic pharmacophore geometry was conducted. The iterative variable elimination procedure IVE-PLS represents a filter for specifying descriptors having potentially the highest individual weighting for the observed potency of cholic acid analogues as drug absorption promoters. In the majority of training/test samplings the model predictivity is fairly stable in a considerable range of variable eliminated by the IVE-PLS methods. A simplified visual inspection of pharmacophore sites gives the clear picture of regions that might be modified to modulate the compound potency. In particular, large regions with suggested favored contribution appear in the close proximity to the negatively charged oxygen atoms wrapping around the side chains bonded to ring B and C for CoMSA and SOM-4D-QSAR approaches as well.
A pseudo-consensus 3D/4D-QSAR methodology was used to extract an average 3D pharmacophore hypothesis by exploration of the most densely populated training/test subpopulations to indicate the relevant factors contributing the drug absorption potency of cholic acid derivatives. The complementary 3D/4D-QSAR SOM-based paradigm coupled with IVE-PLS procedure can extend our understanding of the spectrum of membrane interactions and potency profiles of drug absorption promoters providing valuable clues for property-oriented synthesis as well.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/c6ra15820j |
This journal is © The Royal Society of Chemistry 2016 |