Beth A.
Caine
ab,
Maddalena
Bronzato
c and
Paul L. A.
Popelier
*ab
aManchester Institute of Biotechnology (MIB), 131 Princess Street, Manchester M1 7DN, UK. E-mail: pla@manchester.ac.uk
bSchool of Chemistry, University of Manchester, Oxford Road, Manchester M13 9PL, UK
cSyngenta AG, Jealott's Hill, Warfield, Bracknell, RG42 6E7, UK
First published on 29th May 2019
We show here for the first time that strongly correlated linear relationships exist between equilibrium bond lengths of the sulfonamide group and aqueous pKa values. Models are constructed for three variants of the SO2NHR group: primary benzene sulfonamide derivatives (e.g. diuretic drugs furosemide and hydrochlorothiazide), N-phenyl substituted 4-amino-N-phenylbenzenesulfonamide analogues (e.g. the sulfa antibiotic sulfadiazine) and phenylsulfonylureas (e.g. insulin secretagogue, glimepiride). In the context of these compounds, we present solutions to some of the more complex challenges in pKa prediction: (i) prediction for multiprotic compounds, (ii) predicting macroscopic values for compounds that tautomerize, and (iii) quantum chemical pKa prediction for compounds with more than 50 atoms. Using bond lengths as a powerful descriptor of ionization feasibility, we also identify that literature values for drug compounds celecoxib, glimepiride and glipizide are inaccurate. Our newly measured experimental values match our initial predictions to within 0.26 pKa units, whereas previous values were found to deviate by up to 1.68 pKa units. For glimepiride, our corrected value denotes a percentage of ionization at intracellular pH, which is only now in excellent agreement with its known therapeutic efficacy. We propose that linear relationships between bond lengths and pKa should emerge for any set of congeners, thus providing a powerful method of pKa prediction in instances where pKa data exist for close congeners, thereby obviating the need for thermodynamic cycles.
The acid dissociation constant, Ka, defines the extent to which dissociation of a proton is thermodynamically feasible at a given pH. It provides a means to rationalize acid–base activity in aqueous conditions, an insight that is fundamental to numerous areas of chemistry and biochemistry. Approaches to pKa prediction may be classified as either “first principles” or “empirical”. Purely first-principles methods access the pKa value via the calculation of ΔG for dissociation, using the van't Hoff's isotherm to calculate the dissociation constant. The broad applicability of this approach makes it attractive, and it has been explored using many quantum mechanical approaches.13–18 However, the significant CPU-time that electronic structure calculations incur remains a detriment to the utility of this approach in larger scale screening studies. The program “Jaguar pKa” by Schrödinger is the only commercially available first-principles base pKa prediction package.19,20 The second widely explored approach is to construct an empirically derived model based on the relationships between various chemical descriptors and experimental pKa data. The programs Marvin21 (ChemAxon), Epik22 (Schrödinger), and “Classic” and “GALAS” of ACD Labs23 are just some examples of commercially available packages, all of which implement variants of an empirical-based workflow. Other recent empirical approaches have implemented machine learning methods such as artificial neural nets24 and support vector regression.25
Philipp et al. recently highlighted several issues that remain relevant to both first-principles and empirical-based communities.20 Firstly, the use of 2D molecular representations (e.g. SMILES or SMARTS strings), can lead to poor predictions for compounds with pKa values strongly influenced by geometric features. For example, steric effects, or stereospecific hydrogen bonding interactions, can affect the thermodynamic feasibility of dissociation. Whereas these effects must be explicitly parameterized using 2D structures, the problem becomes less pronounced in first-principles approaches due to the use of 3D geometries. Another problem highlighted by Phillipp et al. is the case of tautomerizable compounds. A theoretically rigorous quantum mechanical approach requires the generation of an ensemble of low-lying conformations for each tautomeric state. This increases computation times, a problem further enhanced in the case of multiprotic tautomerizable compounds. For empirical approaches the problem with tautomerism arises with the need to recognize the dominant tautomeric state, owing to the fact that different tautomers may have different pKa values. Experimental evidence in support of tautomeric preference is lacking for many species.26,27
Strikingly well-defined linear relationships between bond lengths and pKa have now been shown to emerge for many functional groups.28–33 The bond length allows for the partitioning of chemical space into subsets of compounds, within which the site of ionization shares a specific local chemical environment. In the current work, we demonstrate for the first time that linear relationships exist between bond lengths of three variants of the sulfonamide group and their propensity to ionize in aqueous conditions. Fig. 1 shows the common skeletons of the training sets and their corresponding target drugs for the three classes of sulfonamides studied: primary benzene sulfonamide derivatives, 4-amino-N-phenylbenzenesulfonamides and phenylsulfonylureas. We then show how these relationships may be exploited to predict pKa values of sulfonamide groups on a number of pharmaceutically relevant compounds. We also pose solutions for the more complex issues in pKa prediction by demonstrating how to predict with excellent accuracy for (i) multiprotic compounds, (ii) compounds containing more than 50 atoms, and (iii) those compounds that tautomerize. Furthermore, we add credence to the notion that AIBL models (AIBL = Ab Initio Bond Lengths) providing accurate predictions may be established for any type of ionizable group, given that some experimental data are available for parameterization.
For test set compounds TS1-1 to TS1-10 (listed in Results section and in Tables S10–S14†), a conformational search was performed via use of the “Conformers” plug-in within the MarvinSketch program (version 16.1.4.0) by ChemAxon.21 A total of 25 conformers were generated using the Dreiding force-field, with diversity threshold and time step set to default. Each conformer was optimized at the B3LYP/6-311G(d,p) level in the gas-phase using GAUSSIAN09 with tight convergence criteria. The most stable geometry of the compound, whilst keeping the syn conformation of the sulfonamide group, was then identified by a comparison of total energies. For compounds with more than one site of ionization, the proton at the most acidic site of ionization (–SO2N–H– for TS1-7 to TS1-11 and COO-H for TS1-12 to TS1-14) was removed and the anionic state was also optimized. Optimizations were performed with both the 6-311G(d,p) and 6-311+G(d,p) basis sets. The geometries of the N-heterocyclic arenesulfonamide (NHAS) test set compounds labelled TS2-1 to TS2-10 (listed later in the text and in Table S15†), were once again obtained via selection of the most stable from a conformational ensemble of 25, after optimization using GAUSSIAN09. For the sulfonylureas of the test set, denoted TSU, the compounds glibenclamide, glimepiride and glipizide (listed in Table S16†), the number of constituent atoms exceeds 50, and they have a number of rotatable bonds in the R1 and R2 groups. To identify the most stable geometry for each species, a conformational ensemble of 30 geometries was generated using Marvin for each TSU compound, once again followed by optimization using GAUSSIAN09. The above procedure was performed both in gas-phase and with the CPCM, in all cases.
Results of linear regression of each bond length a–f of the CPCM-solvated structures against pKa reveal superior internal validation statistics for the S–N bonding distance (r2 = 0.95, q2 = 0.94 and RMSEE = 0.14). The full set of statistics for each bond length model can be found in Table S17.†
The proximity of an ortho-substituent to the sulfonamide group means that it can influence bond lengths of SO2NH2via steric effects and intramolecular hydrogen bonding interactions, depending on the type of substituent. If the ortho compounds are removed, strong (r2 > 0.8), positive correlations with pKa are observed for all bonds except C–S (a), which is negatively correlated. That is to say, shorter SO (b and c), S–N (d) and N–H (e and f) bond distances are indicative of a greater propensity to ionize, whilst the C–S bond becomes longer with increased acidity. These relationships are illustrated in the plots shown in Fig. S2.† We note that the nature of the variation in bonding distances a–f is the same for syn and anti-conformers in both gas and solvent phase. Therefore, the effect that a substituent has on the constituent bond lengths of the ionizable group is reflected in the relative propensity of the molecule to lose a proton. Overall, those compounds with substituents that make the phenyl group more electron-poor exhibit a lower pKa, and vice versa. In the next section, we make predictions for 14 drug molecules using the equation describing the linear line of best fit for the SN bond length model, i.e., pKa = 92.95 × r(S–N) − 145.10.
Predictions made using the equation above for the S–N bond (d) solvated model are illustrated in Fig. 3, and listed in Table S18.† All compounds are predicted for to within 0.5 units from literature values aside from celecoxib. Our prediction of 9.72 for celecoxib shows an error of −1.38 with respect to the existing experimental value40 of 11.1. Due to the anomalously large error of our prediction, a new experimental measurement was taken using the UV-metric technique. As celecoxib is poorly soluble in water but readily dissolves in methanol, it was necessary to perform Yasuda-Shedlovsky extrapolation. This procedure delivers a pKa value for aqueous conditions from the relationship between pKa value and methanol concentration. Details of the experimental procedure are given in Technical Section 1 of the ESI.† This treatment returned a new experimental value of 9.525 as the average of four measurements, with a standard deviation of 0.06. This new value is now 0.2log units from our predicted value of 9.72. This success shows that AIBL is able to correct for an erroneous experiment. As the precise conditions of the 11.1 value are unknown, we can only postulate that it was obtained in non-standard conditions. An increase in ∼1.5 units can most likely be ascribed to the response of the equilibrium to a lower temperature, the presence of another solvent or a higher ionic strength.
The predictions made by the Marvin tool are also very accurate (MAE = 0.34, see Table S18†). This is likely due to the prevalence of benzene sulfonamide analogues and accompanying pKa data in the literature. However, it should be noted that the correction of the pKa value for celecoxib would not have been realized using the Marvin program, because at 10.7, Marvin's value lies closer to original literature value, and has an error of +1.18 when compared to the new value.
Fig. 4 (a) The relationship between S–N equilibrium bond lengths calculated using CPCM and pKa values for the training set S1-1 to S1-22, (gray dots), and the test set, consisting of 8 drug diprotic drug compounds, labelled TS1-7 to TS1-14. SN bond lengths for the test set have been calculated for the neutral state using the 6-311G(d,p) basis set (blue crosses) and the 6-311+G(d,p) basis set (purple circles), the anionic form of each compound using 6-311G(d,p) (pink diamonds) and with the addition of one set of diffuse functions to this basis set [6-311+G(d,p)] (green dots). Bond length values can be found in Tables S11–S14 of the ESI.† (b) Structures of drug compounds TS1-7 to TS1-14. MAE, standard deviation (σ) and RMSEP shown correspond to the anionic B3LYP/6-311+G(d,p) predictions. |
The applicability radius of this primary sulfonamide model is wide: it covers ortho-, meta- and para-substitution and may be used to predict pKa values of sulfonamide groups on diprotic compounds. Whilst the performance of Marvin is still good, with an overall MAE of 0.65, a standard deviation of 0.41 and a RMSEP of 0.82, 4 out of the 16 predictions showed errors of more than 1log unit and 4 others were between 0.5 and 1log unit. Furthermore, for 3 out of 5 thiazides (TS1-7, TS1-9 and TS1-11), Marvin predicts that the primary sulfonamide group is more acidic than the tertiary sulfamoyl group. No AIBL-pKa errors exceeded 0.34log units, and the trend in pKa values across the series from TS1-7 to TS1-14 mirrors that of the experimental values remarkably well, as shown in Fig. 5 Technical Section 4 of the ESI† also details how the model passes Roy's MAE-based evaluation criteria.
Taking the ipso carbon as the atom bound to the carboxylic acid group, compounds TS1-12, TS1-13 and TS1-14 can be classed as meta- and para- (m/p) substituted benzoic acids. Previous work showed that the gas-phase C–O bond (in the COOH fragment) of m/p substituted benzoic acids is positively correlated to aqueous pKa (as is the CO bond distance, whilst the C–C and O–H distances instead show a negative correlation). Using the pKa data collated in our previous publication,28 41 compounds were re-optimized in the gas-phase at the B3LYP/6-311G(d,p) level of theory. The results previously obtained at the HF/6-31G(d) level were corroborated: the C–O bond length model has an r2 of 0.90 with pKa values. The equation describing this relationship (pKa = 144.97 × r(C–O) − 192.56) was used to predict the pKa values for the carboxylic acid group, modelled in its neutral form, using the most stable gas-phase conformers identified for TS1-12 to TS1-14. According to the literature, bumetanide, furosemide and piretanide exhibit pKa(1) values of 3.74,43 3.6444 and 4.00.45 Our model predicts values of 3.47, 3.33 and 4.23. Whilst it is true that the accurate prediction of pKa values for benzoic acid derivatives is no longer considered a challenge for existing methodologies, (although Marvin predicts values of 4.69, 4.26, 4.68), it is pleasing to note that we return an excellent level of accuracy. Furthermore, the relative order of acidity between compounds is predicted correctly by our model.
The use of the CPCM implicit solvation model in our geometry optimization for compounds S2-1 to S2-38 provides a marginally superior goodness-of-fit over gas-phase geometries, as measured by r2 (see Table S20†). The C–S bond may be referred to as the most “active bond”, as it exhibits the highest r2 value, at 0.96. The cross-validation q2 value, which gives an estimate of model predictivity, is also impressively high for C–S at 0.95 and the RMSEE is respectable at 0.23 log units. For S2, the sign of the S–N (iv) slope is negative whilst bonds i, ii, iii, v and vi have a positive correlation. For this series, the most acidic compound is the 2-Br, 4-NO2 derivative (labelled S2-37, pKa = 5.7), whereas the least acidic species is the 4-NH2 derivative (labelled S2-9, pKa = 10.22). We therefore find that the most acidic species in the series has the longest S–N bond (iv), and the shortest C–N bond (vi).
The presence of the p-amino group on sulfanilamide means there is an additional microdissociation reaction NH3+ → NH2 + H+, which has a pKa value of ∼2. For the sulfonamide group dissociation, the relevant state is the neutral state of each compound. Therefore, the relevant microstate for the aniline group dissociation is the protonated state. However, here we use the neutral state of the compound in accordance with work by Harding et al., which illustrated how there is a linear relationship between r(C–N) and pKa values for aniline derivatives in the conjugate base form. The variation in C–N bonding distance with pKa across a congeneric series can be attributed to the varying extent of sp2 hybridization of the nitrogen atom, which is influenced by phenyl substituent groups. For example, in the work of Harding et al., the C–N, and both N–H bonds have a positive correlation with pKa, i.e., a shorter bond length (in the presence of ortho and para electron-withdrawing substituents) corresponds to a lower pKa. This bond-shortening effect, coupled with an increase in trigonal planarity, is indicative of a higher degree of sp2 character of the nitrogen atom as the phenyl ring becomes more electron-poor at the ipso carbon. Enhanced, substituent-induced nitrogen sp2 hybridization provides greater resonance stabilization of the neutral conjugate base form. However, a more electron-poor ipso carbon destabilizes the conjugate (cationic) acid, thereby favouring the dissociation of N+–H, and thus rationalizing the lower pKa measured for such species. In this work, 20 aniline derivatives were optimized at the same level of theory used throughout this work [B3LYP/6-311G(d,p)]. Because previous work by Harding et al. showed that gas-phase bond lengths exhibit a strong linear relationship with pKa, calculations were also carried out in vacuo to minimize CPU time. Predictions were made using gas-phase C–N bond lengths (of the amine group) of test set compounds TS2-1 to TS2-5 and TS2-9 using the equation pKa(aniline) = 141.57 × r(C–N) − 193.51, for which the r2 value for the line of best fit was 0.92.
Representation of each of the compounds TS2-1 to S2-10 as the higher energy sulfonimide tautomer reveals no relationship between bond length and pKa (i.e. r2 < 0.5 for all bonds). Yet, in the sulfonamide state the r2 value for the test set alone is 0.87. It may be asserted that, given our aqueous-phase experimental observable is related to a structural feature of the sulfonamide state, we provide further corroboration to the work of Chourasiya and co-workers, which suggests this state is dominant in aqueous conditions.
Using the same experimental values used to construct Fig. 7, the MAE value for Marvin's predictions is found to be 0.70 with a standard deviation of 0.67. Taking the values closest to Marvin's predictions these values become 0.66 and 0.70. However, the general performance of Marvin is good for this series, and the larger MAE value can be mainly attributed to larger errors for four compounds: TS2-3 (0.93), TS2-6 (0.97), TS2-8 (1.28) and TS2-10 (2.19). A plot showing the general trend across the series for experimental, AIBL-pKa values and Marvin predictions can be found in Fig. S3 of the SI.† Technical Section 5 also details how our model fares according to Roy's MAE-based evaluation criteria for the 14 compounds TS2-1 to TS1-10. The majority of training set data is taken from just one source, where the exact conditions (i.e. temperature, solvent) are not reported. It is feasible that the values given in that work were recorded at a temperature lower than 298 K, with a small percentage of another less polar solvent, or at a different ionic strength to the drug compounds. The amalgamation of the test and training set data points returns a linear fit with an r2 value of 0.95. It may be asserted that the use of this equation, in the case of future predictions may help towards reducing systematic error associated with deviation from standard conditions between test and training set.
The most recent computational study of sulfonylurea type compounds was carried out by Kasetti et al. in 2010.51 In that work, the authors note that the global minimum conformation of the sulfonylurea fragment, in the gas-phase, contains an N–H⋯OS hydrogen bond. The existence of this IHB is corroborated by analysis of bond critical point properties (a concept derived from Atoms in Molecules theory and carried out by the program AIM2000). The iminol tautomeric forms of sulfonylureas were also assessed for their stability relative to the amine form, and were found to be ∼21–25 kJ mol−1 higher in energy. Our working hypothesis is that an AIBL-pKa model should emerge from the bond lengths of the lowest energy conformations of the lowest energy tautomeric form of our compounds. Therefore, only the amine tautomer form is considered in the following discussion.
The most stable conformation of all 35 compounds of our dataset contain a phenylsulfonylurea fragment, the 3D geometry of which closely resembles that of the optimized structure reported by Kasetti et al. This structure is shown in Fig. S3.† In the following analysis, we use the 6 marketed drug compounds (shown in Fig. 8) as a test set. With the additional data procured from the literature making up the training set (30 compounds, SU-1 to SU-30 as shown in Table S7†), there is a 20/80 split of external test set to training set.
Phenylsulfonylureas are found to have the same active bond length as the Pr-BSA series (S–N), despite the differing chemical environment of the acidic proton (full set of statistics relating to each bond length vs. pKa model is shown in Table S22†). Although the r2 and q2 values for the S–N bond (D) model are only 0.93, the RMSEE is particularly low at 0.14. Notably, a relationship emerges here for the S–N bond lengths despite significant variation in the identity of both the R1 and R2 groups (see Table S7†).
ID | Drug | r(S–N) | Exp. pKa | AIBL-pKa | AE | Marvin | AE |
---|---|---|---|---|---|---|---|
a pKa value measured in this work. All bond lengths for compounds TSU-1 to TSU-6 are given in Table S16 in the ESI. | |||||||
TSU-1 | Chlorpropamide | 1.68513 | 4.75 (ref. 53) | 4.69 | 0.06 | 4.31 | 0.44 |
TSU-2 | Carbutamide | 1.69367 | 5.79 (ref. 54) | 5.75 | 0.04 | 4.35 | 1.44 |
TSU-3 | Acetohexamide | 1.68266 | 4.31 (ref. 54) | 4.39 | 0.08 | 3.33 | 0.98 |
TSU-4 | Glibenclamide | 1.68980 | 5.22 (ref. 55)/5.30 (ref.56)/6.80 (ref. 56) | 5.27 | 0.05/0.03/1.53 | 4.32 | 0.90/0.98/2.48 |
TSU-5 | Glipizide | 1.68822 | 5.21a/5.90 (ref.57) | 5.07 | 0.14/0.83 | 4.32 | 0.89/1.58 |
TSU-6 | Glimepiride | 1.69235 | 5.32a/7.26 (ref. 52)/8.03 (ref. 52) | 5.58 | 0.26/1.68/2.45 | 4.32 | 1.00/2.94/3.71 |
MAE | 0.10 | 0.94 | |||||
σ | 0.09 | 0.31 | |||||
RMSEP | 0.13 | 0.98 |
To investigate the cause of these large prediction errors, pKa values for glimepiride and glipizide were measured once more in this work using the UV-metric method. Measurements were taken in three different methanol concentrations, and pKa values in aqueous conditions were obtained via Yasuda-Shedlovsky extrapolation. The new value for glipizide of 5.21 matches our prediction (5.07) to within 0.14log units and the new value for glimepiride matches our prediction to within 0.26log units. Given that our prediction and the new measurement are similar, and given that these two values are also close to those of the structurally similar compounds, glipizide and glibenclamide, we are inclined to believe that 5.32 is the most accurate experimental value for glimepiride.
There is an important consequence of the substantial correction in pKa value established here. According to the Henderson–Hasselbalch equation, the new pKa value of 5.32 for glimepiride suggests that 99% of molecules are ionized at pH 7.4 (cell pH), whereas the previously reported value of 7.26 suggests that only 58% are ionized. The therapeutic activity of insulin secretagogues like glimepiride is known to stem from their ability to bind to a receptor on the ATP sensitive K+ channels of pancreatic β-cell membranes.58 It is thought that the anionic state of the molecule partakes in this binding activity.59 A higher percentage of molecules existing in their ionized state means that there is a higher percentage that can partake in binding with the receptor to block the channel, which should be linked to greater efficacy of the drug. As glimepiride is currently used in the treatment of diabetes, the lower value that we report is perhaps in better agreement with its known therapeutic effects.
Finally, we note that the use of the S–N bond length of the glimepiride conformer that is likely to be more stable in solution gives a marginally larger error than the most stable gas-phase/CPCM conformation, compared to the new value of 5.32. This perhaps goes against chemical intuition. However, it is pleasing to note that good estimations of pKa may be made without user knowledge of explicitly solvated conformational preference, as is almost always the case in practice.
Table 1 shows that the average prediction accuracy of the Marvin program is inferior to that of AIBL for the same compounds, with all but one of the predictions giving an error of more than 0.5 units. Moreover, the Marvin MAE is 0.94 whereas ours is 0.10. Note that Marvin and AIBL have been compared on an equal footing and allowing for their best performance because, for both, we took the experimental values closest to their respective predictions. The trend of experimental pKa variation across the test set series is compared to AIBL-pKa predictions and Marvin is shown in Fig. S5,† showing that AIBL performs very well in this respect. Roy's criteria for MAE evaluation cannot be applied here because the test set contains less than 10 compounds.
We may build on the work of Breneman and Weber to begin to understand the patterns of bond length variation between our series of congeners. Significantly, the authors note that the S–N bond length of fluorosulfonamide is shorter than that of sulfonamide (at each increment of the rotation). The authors attribute this shortening effect to the increased sp2 character of the fluorosulfonamide amine group compared to that of sulfonamide, which instead has more sp3 character. The authors claim that as the N lone pair is donated towards the sulfur (and partially into the S–F σ* antibonding orbital), this in turn, enhances the electronegativity of the N atom. Subsequently, σ-withdrawal of electron density occurs from the sulfur atom towards the now more electronegative N atom. This hypothesis is in part evidenced by those authors' observation that, in the syn conformation of fluorosulfonamide, the nitrogen of the amino group is at its peak electron population, whereas sulfur is at its most depleted. In the parlance of the authors, the fluorine atom provides a better electron sink, stabilising the anti-conformation in fluorosulfonamide via delocalization of the lone pair on nitrogen into the S–F σ* antibonding orbital. In the absence of the F atom, the sulfonamide system favours the syn-conformation, due to the stabilization afforded by lone pair donation to the S–O σ* antibonding orbitals over the higher energy S–H σ* orbital.
Now we may consider the above rationale in the context of the S1 series, which are most stable in the anti-conformation. The most acidic compound of the set is S1-19 (the 2,3,4,5,6-F5 derivative). In addition to having the lowest pKa value, this compound is found to have the longest C–S (a) bond length and the shortest b–f bond length of the whole series. Fig. 9 illustrates the overall pattern of bond length variation with pKa in terms of positive or negative correlation (shown in red or blue in Fig. 9a and in Table S17†). One interpretation of this variation in bond distances is related to the fact that the ipso carbon of the phenyl group in S1-19 will be more electron-poor compared to the unsubstituted analogue S1-1. It follows that the C–S σ* antibonding orbital would be a better electron sink in S1-19 than in the unsubstituted species S1-1, making rehybridization of the N lone pair more favourable (Fig. 9b to c). Consequently, in line with the above rationale, the S–N bonding distance is shorter in species with more electron-withdrawing substituents due a relative increase in its s character. Greater s character of the sulfonamide N atom then in turn explains why compounds with shorter N–H (e and f) bonds have lower pKa values: enhanced s character suggests that the electrons of the anion lone pair are held closer to the nucleus, thus providing enhanced stabilization to the anionic state.
In accordance with the above hypothesis, the reduction in S–N bond length will be accompanied by an increase in the C–S bond length, as electron density is put into the σ* antibonding orbital, decreasing bond order. Indeed, the C–S bond length vs. pKa plot has a negative slope, showing that congeners with the highest pKa values have the longest CS distances (Fig. 9a).
To add further credence to the link between bond distances and pKa, an Interacting Quantum Atoms (IQA) analysis was performed on the wavefunctions of compounds S1-1, S1-4 and S1-9, representing the unsubstituted species, the most basic (4-NHCH3, pKa = 11.00) and one of the most acidic compounds. Due to the close proximity in space between an ortho substituent and the ionizable group, S1-9 was chosen as the more acidic species over S1-19 in order to restrict substituent effects to those transmitted through bonds of the aromatic ring. The inter-atomic Vcl(A,B) values derived from IQA analysis can be interpreted as a more negative value denoting a more favourable electrostatic interaction. For the three species, S1-1, S1-4 and S1-9, we find that the trend in the magnitude of Vcl(A,B) values mirrors that of the bond lengths, i.e. the lower the pKa, the shorter the bond and the more negative the Vcl(A,B) value is (values are listed in Table S23†). Similarly, the exchange-correlation energy terms Vxc(A,B), which represent the degree of electron delocalization (or covalency) of an interaction, also mirror the trends in bond lengths with pKa (a more negative Vxc(A,B) value corresponds to a shorter bond A–B), except in the case of both sets of Vxc(N,H) values. For both N–H bonds, the extent of electron delocalization decreases between N and H atoms with increased acidity (i.e. absolute value of Vxc(N,H) decreases from 334 kJ mol−1 in S1-4 to 329 kJ mol−1 in S1-9), whilst simultaneously, the electrostatic interaction between N and H atoms increases with acidity (i.e. the absolute value of Vcl(N,H) increases from 206 kJ mol−1 in S1-4 to 228 kJ mol−1 in S1-9). These observations suggest that, as the N atom gains s character in going from sp3 to sp2 hybridization (Fig. 9a and b), there is a decrease in electron delocalization between N and H atoms, which is in line with an increased propensity for N–H cleavage.
Support for the above rationale can be found in the observed increase in the C1–C2–N3–H4 dihedral angle towards 180° (i.e. co-planarity of the N–H moiety with the N-phenyl ring), the more strongly electron-withdrawing the substituents are and the lower the pKa. The variation in this torsional angle can be explained by an increase in sp2 character of the sulfonamide nitrogen as the lone pair is conjugated with the π-system, towards a quinoid-type resonance structure (Fig. 10b).
Using a similar rationale as for the benzene sulfonamides, we can assert that when the substituent X on the Ph group makes the ipso carbon electron-rich, the C–S σ* antibonding orbital a worse electron sink (Fig. 11b). When the X substituent on the Ph group causes the ipso carbon becomes electron-poor (Fig. 11c), it becomes more stabilising to put electron density into the C–S σ* antibonding orbital, and thus it is a better electron sink. In this latter case, the shortening effect on the S–N bond and the lengthening of the C–S bond would be enhanced. However, with a more electron-rich sulfur, the lone pair on nitrogen can instead be delocalized across the N–C (F) and CO (G) bonds of the urea fragment. This direction of delocalization being favoured would explain why in the most basic compound of the series (SU-5, 4-NMe2, pKa = 5.85), which contains a good π-electron donating group, the C–S (A) and N–C (F) bonds are also the shortest. The CO bond (G) of the urea moiety in SU-5 is also the longest, suggestive of a situation similar to that shown in Fig. 10b.
For the first time in the context of the AIBL approach, we have also proposed rationales for bond length variation in the presence of substituent groups, and why this may be related to their propensity to ionize.
A significant feature of the work is that we have shown that the AIBL approach may be applied to predict pKa values of compounds with more than one ionizable group. This feat is achieved by optimization of the compound with the pKa(1) site in its dissociated, anionic state. Prediction accuracy is found to be optimized by adding diffuse functions to non-hydrogen atoms [6-311+G(d,p)]. In our workflow, quantum chemical calculations are performed on only the relevant microstate, yet the average absolute error on our predictions is 0.16log units. The overall accuracy of the AIBL predictions presented here is found to be superior to that provided by Marvin for the same series of compounds.
Finally, we have corrected the experimental aqueous pKa values of three drug compounds: celecoxib, glimepiride and glipizide. For celecoxib, our model predicted a value of 9.72 for the sulfonamide group, which is lower by 1.58log units than the widely quoted literature value of 11.1. The pKa was measured once more in this work, which returned a value only 0.2 pH units away from our predicted value (9.52). Furthermore, our prediction of 5.58 for glimepiride was corroborated by a new experimental value of 5.32. Both of these new glimepiride pKa values suggest that a significantly higher proportion of molecules exist in their anionic state at pH 7.4 than is suggested by the previously reported value (7.26). The pKa value of glipizide was also re-measured, and the new value of 5.21 was found to be closer to our predicted value of 5.07 than the existing literature value (5.90). Moreover, glimepiride and glipizide both contain more than 50 atoms, which would invoke significant CPU time in a conventional first-principles workflow. However, we perform a quantum chemical calculation on only the neutral species, and arrive at accurate predictions despite the significant size and complexity of these compounds.
In the lead optimization process of drug discovery, we propose that AIBL can serve as a highly accurate estimator of pKa variation for various series of analogues generated for an active scaffold. Furthermore, the AIBL-pKa approach can be used to check the consistency of a group of pKa measurements, and can therefore serve as a rectifier for experimental outliers.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/c9sc01818b |
This journal is © The Royal Society of Chemistry 2019 |