Predicting protein–ligand binding affinity and correcting crystal structures with quantum mechanical calculations: lactate dehydrogenase A

Quantum calculations plus lipophilicity (log P) lead to usefully accurate predictions of binding affinity that allow correction of crystal structures.


Background
The interaction of small guest molecules with larger hosts in aqueous systems is a key component of many applications including: sensors for in vivo use, 1 enzyme catalysis, 2 antibody recognition, 3 sequestration, 4 purication of waste streams, 5 environmental remediation, 6 drug design, 7 toxicology, 8 and supramolecular chemistry. 9 The structures of such complexes are routinely studied with X-ray diffraction experiments and those involving proteins are deposited in the Protein Data Bank (PDB). 10 These have been investigated to learn about how proteins and ligands interact. 11 Such studies presume that the structures are correct and of high quality based on descriptive statistics describing how well the structure can explain the observed diffraction data. Ourselves and others have found that these statistics are not sufficientchemically infeasible ligand structures can achieve apparently better statistics. 12 Quantum mechanical (QM) calculations can provide a complementary guide to the chemical reasonableness of structures. A recent study illustrates the challenge: Kumar et al. contrasted quantum mechanical calculations with deposited structures and found a divergence between complexes of arginine (which behaved as predicted) and lysine (which did not); the contrasting likelihoods of these two residue types being clearly identiable in the electron density was not considered but could provide an alternative interpretation of their ndings. 13 Routinely allowing quantum calculations to decide when to go beyond reliance on the diffraction data alone would increase the likelihood of identifying the correct structure but requires a method that can predict binding strength to a useful level of accuracy. Here we propose such a method and show how it has guided us to improved structural interpretations for an enzyme of pharmaceutical interest. This has permitted us to begin the process of deducing when differences in diagnostic statistics, such as R work and R free are not signicant. Others have previously lamented the reliance on these statistics and particularly when they distract from achieving a sound chemical interpretation that correctly accounts for the physical interactions at play. 14 Correctly quantifying the interaction strength between a host and a guest or a protein and a ligand computationally requires a physically accurate description of the molecules, of the kind provided by QM calculations. Despite signicant progress, such calculations remain challenging because of their computational cost. Including only the portion of the host or protein directly involved in binding reduces this cost and examples of this approach are mentioned in recent reviews. 15,16 Related calculations see the protein or ligands split into fragments that can then be treated with QM. [17][18][19][20][21][22] Geometries generated by a molecular dynamics simulation can be treated with QM to compute affinities. 23 Alternatively, much faster QM methods (which are usually less general or accurate) can be applied to larger portions of the system, or empirical corrections applied to improve accuracy with minimal increased computing cost. [24][25][26] Alternative approaches to understanding the behavior of proteins and quantifying their interaction strength with ligands have been pursued vigorously over the past decade. The majority of this effort has been in the area of simulation and recent advances in free energy perturbation (FEP) are particularly noteworthy. 27,28 One signicant issue with these methods is that (unlike quantum approaches) the parameterization schemes that underpin the description of the ligands oen require individual tailoring to the molecules being studied. 27 Like quantum calculations, simulations require a signicant amount of computational time to compute binding energies.
Structural biology studies oen entail generating crystalline solids containing protein-ligand complexes. These are irradiated to create X-ray diffraction patterns, arising from interactions with electrons in the molecules. Soware is employed to link from atomic coordinates to electron density and hence Xray diffraction patterns; thus a likely set of atomic positions can be identied. These interpretations can be problematic and are rarely unambiguous. 12 QM calculations can support these interpretations but it is usual for improved agreement between predicted and observed diffraction patterns to determine the structure deemed to be correct. [29][30][31][32][33][34] One way to meld the chemical insight provided by quantum chemical calculations with the experimental data available from a diffraction experiment is to iteratively include quantum calculations in the renement procedure; higher quality structures of proteins and ligands have been achieved in this way using the quantum renement approaches pioneered by the Ryde group. 30 This has proved particularly effective for metalloproteins and for assigning protonation states. 15,30,[35][36][37] Indeed, QM-derived ideas have been inuencing structural renement for nearly a century. Sometimes, the diffraction data cannot determine the correct structure and this is particularly the case for novel ligands whose behavior is much less well understood than that of proteins. 38 One way alternative interpretations of the diffraction data can be assessed is by examining ligand omit maps that are created by using the coordinates of the protein to generate phases and to use these to determine the ligand electron density that is unaccounted for. [39][40][41] We propose an approach in which a correlation with computed log P values provides a usefully accurate treatment that accounts for the differences between binding energies computed with quantum mechanics and experimentally observed values. We style the resulting type of calculation as a "theoceptor" (theoretical receptor) by analogy to the theozymes (theoretical enzymes) of Houk and co-workers. 42,43 We show that the approach can be applied to predicting aqueous host-guest interaction energies in the realm of biological or supramolecular chemistry. We also present a theoceptor for the enzyme Lactate Dehydrogenase A (LDHA) and demonstrate the value that theoceptor calculations can provide in terms of detailed structural insights and improved crystallographic structures.
The current study is presented in three parts. In the rst, we describe our approach. Subsequently, we describe our theoceptor for LDHA. Finally, we show how these calculations can increase the value of protein-ligand crystal structures and provide alternative interpretations of the electron density for several examples taken from LDHA.

Results and discussion
Quantum mechanical calculations generally deal with the gas phase electronic energy of the system whereas experimental observations in biological systems are related to free energies in solution (eqn (1)).
The free energy of binding in solution (DG bind (aq)) is related to the change in gas phase electronic energy (DE, eqn (2)) computed with QM. Corrections (DH corr ) are added to obtain gas phase enthalpies and a gas phase entropy change (DS(gas)) is computed. The change in solvation free energies (DDG solv ) completes the link to the experimental environment.
The electronic energies in eqn (2) can account for the energy of the complex but do not indicate the shape of the potential energy surface (PES), which describes how rigid the complex is. When two molecular species come into proximity, the potential energy surface can involve a narrow minimum or a wide one (Fig. 1A). A narrow minimum corresponds to a rigid complex and a greater entropy penalty than a wider minimum. Different interaction types make different contributions to the shape of the PES. Density functional theory calculations were used to investigate how the energy of the system changes as several functional groups approach one another. The full set of calculations is described in the ESI (section S1 †) but the key results can be summarized by considering water and methane as examples of polar and nonpolar functional groups, respectively (Fig. 1B). Only when two polar groups (appropriately oriented) or charged groups of opposite sign approach one another is there a well-dened energy minimum; only polar-polar interactions tend to increase complex rigidity. It is of course the case that even many polar groups will have a repulsive interaction when the polarities are mismatched and that two non-polar groups that are aromatic rings can give a shallow energy minimum. 44 To understand the thermodynamics of a gas phase binding event (DH corr (gas) À TDS(gas)), it is important to consider the relative contribution of polar and non-polar groups. Furthermore, the solvation free energy of the ligand and of the complex in water will be enhanced by the presence of polar groups on the ligand and diminished by non-polar regions. The solvation free energy of the protein is a constant and so accounting for whether the groups involved in binding are polar or non-polar offers a simple way to account for the change in solvation free energy (DDG solv ).
Medicinal chemists have been concerned about the inuence of non-polar groups upon binding affinity for some time and log P has been used to differentiate ligands that rely primarily upon polar interactions from those that are hydrophobic. [45][46][47] The partition coefficient, P, and its logarithm are convenient parameters because they are easily measured and can be accurately predicted. 45 It is our hypothesis that a ligand's log P can serve as a surrogate for the three terms on the right of eqn (1) when computing its binding free energy. 48 This leads us to propose that for a given system, binding affinity can be described by eqn (3).
where a, b and g are constants for that system. We propose obtaining values for these empirically and DE is dened by eqn (2), where E indicates a quantum mechanical energy that is obtained with no vibrational corrections included. The energy of the receptor (E receptor ) in this equation is arbitrary because it is constant for all ligands but it will inuence the resulting value of both a and g. A range of affinity assessment types (IC 50 , K i , K d , etc.) can be used but should all be transformed such that they will have a linear relationship with energy. We do this by transformation to pIC 50 , pK i , pK d such that improved affinity corresponds to increasing values of each of these properties; a should therefore be negative and b positive if more negative values of DE and increased lipophilicity correspond to tighter binding. In order to obtain values for the three constants, at least three compounds of known affinity are required and thus, like the linear interaction energy approach, only relative affinity can be computed. 49 DE would usually be computed using the lowest energy protein-ligand complex and lowest energy conformation of the free ligand and therefore, be relevant to cases where the bound state is dominated by one geometry. 16 However, the approach can be simply adapted to include multiple low energy conformations (of bound or free states) using a Boltzmann weighting scheme. The exible form of eqn (3) naturally ameliorates other deciencies of the calculations including the exclusion of longer range interactions (primarily electrostatic) with the regions of the protein that are not included in the theoceptor and deciencies in the continuum part of any solvation models that are used (such as inappropriate values of the dielectric of the medium). It is challenging to predict the dielectric that the protein environment provides and so we take advantage of this approach by treating the protein using gas phase calculations with no continuum solvation. The effect of eqn (3) is that for a polar ligand that makes a full set of polar interactions with the receptor, DE will be large and negative and this will offset the low log P term. A polar ligand that does not make a satisfactory set of polar interactions will have a less benecial DE term that will not be sufficient to counteract the small log P contribution. A largely hydrophobic ligand will also have a smaller contribution to binding affinity from DE but this might be offset by the hydrophobicity term. There have been a number of reports in which quantum mechanical calculations have been used to compute binding energies for protein-ligand and other host-guest interactions. Energy differences are computed with no vibrational corrections and provide data with which this concept can be tested (in certain cases, the log P values have been computed by ourselves). The examples (Table 1) include: (A) Roos et al. who looked at the relative binding affinities of a matched series of mineralocorticoid receptor (MR) agonists, using a range of DFT methods, with M062X performing the best. 50 (B) Investigation of inhibitors of the dimerisation of inducible NO-synthase by Leach et al. 51 This system includes backbone atoms of six residues forming the binding site, an iron porphyrin and its attached cysteine thiol. Among the QM methods tried, M06HF performed the best. (C) Roos et al. who used DFT to explore the relationship between the measured and predicted affinities for a set of positively charged amidine and guanidine cores binding to the b-site of APP cleaving enzyme (BACE-1). 52 (D) Hylsová and co-workers, who have studied a series of pyrazolopyrimidine inhibitors of the kinase CDK2. 53 As shown in Fig. 1C and section S2, † in each of these examples, incorporating log P improved the description of measured binding affinity by the QM energies. To make the comparison in the most straightforward way, two-parameter linear regression has been performed using rstly DE and calculated log P as independent variables and secondly DE and a random value that spans the same range as the log P values. The statistics for the second cannot be worse than those for the single parameter model using only DE. The resulting R 2 and RMSE values are shown in Table 1. In all cases, the addition of log P improved the correlation. For system A, Roos et al. also performed FEP calculations and achieved a correlation with R 2 of 0.60 and RMSE of 0.73 (worse correlation is found when correlating with pK i -log D instead of just pK i ). 52 The values obtained for the coefficients in eqn (3) are shown in Table 1, along with the standard deviation obtained from systematic leave-one-out analysis. It is likely not a coincidence that the BACE inhibitors (b ¼ 0.24) are binding to a highly polarized dianionic binding site while the iNOS inhibitors (b ¼ 1.22) bind to a site dominated by the hydrophobic porphyrin of heme and MR agonists (b ¼ 1.58) are binding to sites which lack substantial polarity (a lone threonine provides a point of polar interaction). The CDK2 binding site (b ¼ 0.88) includes hydrophobic residues and several polar interactions. The more hydrophobic the binding site, the higher the value of b. 48 The same approach is also applicable to organic host-guest systems in water. For instance: (E) in their computational studies of a set of carboxylic acids binding to an octa-acid cavitand, Mikulskis et al. 54 compute binding energies using several DFT methods and achieve their best results (before the addition of vibrational corrections) with BP86/TZVP when Cosmo solvation is included in their calculations. (F) Kimura, Yukiyama and Fujisawa studied the complexation of a series of nitriles by an a-cyclodextrin host in water and performed calculations at the B3LYP/6-31++G** level to rationalize their observations. In both cases, the agreement between computation and experiment is enhanced when eqn (3) is applied. In the case of these two organic hosts, the values of b are higher than for the protein-ligand systems. This likely reects the absence of polar groups inside the cavities (unlike proteins which must contain amide groups). The sets of compounds studied in host-guest interactions are smaller than for the protein-ligand binding examples and this likely explains the larger standard deviations observed for the values of the coefficients in Table 1. Naturally, the organic hosts are more challenging to study with force eld-based approaches because they would require parameters to be developed for both the host and guest.
To investigate the utility of the theoceptor approach, it has been applied to a system of therapeutic interest: Lactate Dehydrogenase A (LDHA). LDHA catalyses the reversible conversion of pyruvate to lactate, with the concomitant conversion of NADH to NAD+. Several classes of cancers are characterised by elevated levels of lactate, and LDHA is overexpressed in human tumors. [56][57][58][59][60][61] As such, inhibitors of this enzyme have been sought as potential therapeutics. Several classes of inhibitors of LDHA have been described and a wealth of structural information and binding affinity data are available. [62][63][64][65][66][67][68][69][70][71][72][73][74][75][76][77] The set of compounds (1-11) studied include seven from high-throughput screening (HTS) of the Genentech/Roche corporate compound collection; 66,68,69,71,72,78 two small, negatively charged compounds; 73 and three compounds sharing malonate as a common substructure originating from AstraZeneca's fragment screening approach (Table 2, Fig. 2). 74 The selective and potent LDHA inhibitors discovered so far exhibit a common structural feature: a carboxylic acid moiety (or other acidic functional group) that is ionized at physiological pH. Structural studies reveal that in the binding site, this is placed in close proximity to where the acid in the substrates and products binds. All interact with the basic side-chain of Arg168.
Structures of 1-7 are all available in complex with human LDHA. 66,[68][69][70]72 Crystal structures of 8 and 10 are available in complex with rat LDHA (PDB codes 4AJE and 4AJI), 79 and the structure of 11 is available in complex with rabbit LDHA (PDB code 4I8X). 80 It should be recalled that these structures are proposed interpretations of the observed electron density and alternative interpretations may also be reasonable. While global metrics such as resolution have their place for assessing the structures, more localised values prove instructive when focusing on the ligands. One of the most common metrics for assessing the t of a proposed structure to the local electron density is the real space-correlation coefficient (RSCC). It ranges from 0 ('bad', electron density is effectively missing) to 1 ('good', model ts the density perfectly). [81][82][83] RSCC values for the selected ligands are given in Table 2. Density maps contoured at 1s of the ligands with lower RSCC values (ligands 1, 2, 6 and 7) are shown in section S5. † Residues directly involved in ligand binding were selected as a part of the theoceptor: Arg168, His192, Val234, Asp165, Table 1 Protein-ligand and host-guest systems (see text) studied by QM and reevaluated by ourselves. Pearson's correlation coefficients (R 2 ) and root-mean-squared errors (RMSE) describe the link between affinity and DE when combined with log P or a random number. The coefficients for eqn (3)  Asp194, Tyr238, and Asn137 (Fig. 3). These were extracted from the structure deposited in the PDB with the code 4QO7 (theoceptors derived from other protein structures performed equivalently). Protonation states were assigned such that Arg168 is protonated, Asp165 and Asp194 are both deprotonated and His192 is protonated in order to form a stabilizing interaction with Asp165 and the net charge of the residues was 0. There are two noteworthy changes in the sidechain conformations amongst the deposited structures but one arrangement is clearly most relevant to the bound state in humans: Arg105 protrudes into the binding site in rat and rabbit structures as well as in the apo form of human LDHA but in the holo forms of the human protein, it interacts with Glu191 instead (and so it is excluded from the theoceptor) while Tyr238 moves out of the binding site only in the apo form (and hence is included). Asp165 forms a hydrogen bond with His192 and is stabilised by an additional hydrogen bond with one water molecule. Preliminary studies showed that the water molecule maintains the sidechain in an optimal position to interact with the His192 sidechain and so is considered structural and was kept in the theoceptor (Fig. 3, insert). All interactions with the ligand are through protein sidechains so backbone, cofactor and the remainder of the protein were deleted. To account for the scaffolding effect of the rest of the protein, C a , C b and water oxygen atoms were xed in space using the xed Cartesian redundant coordinate feature in Gaussian 09, 84 as shown in Fig. 3. The geometries were optimized in vacuo at the M06 level of theory with the 6-31+G** basis set because this level of theory should provide a wellbalanced treatment of different interaction types. 85,86 Free ligand structures were optimized with solvation incorporated, using the PCM water model, at the same level of theory. DE values were dened as in eqn (3) with E receptor being obtained from a theoceptor calculation in which the ligand had been deleted. The values of the coefficients were a ¼ À0.080, b ¼ 0.892 and g ¼ 0.419 and yield the relationship shown in Fig. 3. The value of b, when compared to those in Table 1, is consistent with a binding site that is hydrophobic in parts with a polar section, as included in the theoceptor. Proteinligand crystal structures are interpretations that are several steps removed from the experimental measurements, and so include uncertainties, many of which depend on the force eld used during renement. 12 QM calculations can help discriminate amongst plausible interpretations of the observed electron density. The chemical rigour demanded by theoceptor calculations can improve: ligand conformations, interpretations of the positioning of heteroatoms, stereoselectivity, positioning of ligands where electron density is missing, assignment of tautomeric and ionisation states. 87 Further, by studying a set of conformations (and tautomers when relevant) of the free and bound ligand, differences between the low energy conformations and the bioactive conformation are identied and the conformer-focusing problem can be addressed. 88 Compounds where one or other of these effects were relevant are described.

Compound 1
The tetrahydropyran ring of ligand 1 is crystallographically modeled in a boat conformation that yields a predicted affinity far from experiment. This is corrected when modeled as a chair; the free conformation is 9 kcal mol À1 lower than the boat and the theoceptor with 1 in a chair conformation is 10 kcal mol À1 lower in energy than with a boat conformation (Fig. 4). The RSCC of the ligand was 0.94, which implies that the ligand ts the density well. However, the density contoured at 1s reveals that both chair and boat would t equally well; the differentiating part of the density is absent.
In order to investigate this further, the deposited structure was re-rened (using the PHENIX suite of soware) 41,88 using two alternative sets of restraints on the conformation of the Fig. 3 Theoceptor for the LDHA nicotinamide site taken from the complex with 3 (shown in grey). C a , C b (spheres) and water oxygen (red sphere) were fixed during the QM optimization. The insert shows the hydrogen bonding network involving a water molecule, Asp165 and His192. Measured affinity is plotted against computed affinity for compounds 1-11, oxamate and malonate. R 2 ¼ 0.85 and RMSE ¼ 0.76. tetrahydropyran ring: one in which the ring is kept as a chair and one as a boat. In this structure, the ligand is present in all four protein molecules in the asymmetric unit. Subsequent to the single round of renement, omit maps were generated in which the remainder of the structure is used to compute the phasing and the density in the omitted region (the ligand). These maps are shown for the four ligands present in the asymmetric unit in Fig. 5 and conrm that the available density cannot provide a denitive conformation for this ring. Better statistics were obtained for the boat conformation (the all boat structure achieved R work of 0.1553 and R free of 0.2568 while the all chair structure achieved R work of 0.1575 and R free of 0.2574). Inappropriate restraints and the statistics used to analyze the outcome misled the original renement resulting in the wrong ligand conformation. Occasionally (see below) the correct conformation of a ring will not be so trivial and in these circumstances a quantum mechanical binding energy that accounts for the conformational preference of the ligand in the eld of the protein can provide a very useful companion to a structural renement.

Compound 2
The RSCC for ligand 2 of 0.83 suggests that it does not t the density map accurately and inspection reveals that the density around the indane moiety is missing. The rst step to model this compound was to determine the preferred position of the oxygen atom in the dihydropyrone ring, because this can easily be misplaced when interpreting electron density maps. 12 Changing the position of the oxygen atom inverts the conguration of the stereocentre. This compound was obtained as a single, unknown enantiomer and its absolute stereochemistry was assigned as R by Fauber et al. on the basis of the protein-ligand crystal structure. 68 The energy difference between the theoceptor optimized with the oxygen in each of the two possible positions was less than 0.2 kcal mol À1 , indicating that the ring oxygen is not interacting with the theoceptor and the proposed positioning is reasonable. 71 The missing electron density around the indane moiety would be consistent with the phenyl substituent adopting pseudo-axial or pseudo-equatorial orientations, relative to the dihydropyrone ring.
In the free ligand, the difference between these two conformations is less than 0.1 kcal mol À1 but in the complex with LDHA, the conformation with the phenyl in the pseudo-axial conformation is 8 kcal mol À1 lower in energy than in the pseudo-equatorial conformation (Fig. 4). This energy change originates from additional edge to face stacking between the ligand in its pseudo-axial conformation and tyrosine (Fig. 4, with the R stereochemistry retained for ease of comparison). We therefore propose a bound structure in which the position of the ring oxygen is maintained but the phenyl group is in the pseudo-axial position. This provides a better interpretation of the experimental data and suggests that the absolute stereochemistry is likely to have been misassigned based on the deposited crystal structure and is actually S.
To investigate this possibility, the deposited structure was re-rened. In this case, the ligand is only present in one of the proteins in the asymmetric unit and geometries of the two stereoisomers and two conformations were created in GaussView and were overlaid onto the ligand in the deposited structure using the pair_t command in pymol. 89,90 These initial positions were used to initiate a rst round of renement in PHENIX. The resulting ligand geometries were extracted and used to create a set of restraints in the eLBOW tool. 91 These were then used during a second round of renement in PHENIX that was analyzed with omit maps. The values of R work and R free obtained during this phase are provided in Fig. 5. The renement statistics for the axial conformations are inferior to those for the equatorial. However, the maps in Fig. 5 reveal that the aromatic ring is not strongly supported in either position by the available experimental evidence. In this case, even though the structure has a reasonable overall resolution of 1.91Å, expanding the evidence base for the structure to include theoceptor energies would permit a better-informed determination of the conformation and therefore the identication of the stereochemistry would be placed on a stronger footing. Ensuring that chemical insight (using knowledge or quantum mechanically derived energies) is involved in dening the conformational preferences of ligands when in protein binding sites should ensure that correct deductions are more likely to be made from these structures.

Compound 11
As with the oxygen atom in 2, the pyridine nitrogen in 11 is impossible to place with certainty at 2.2Å resolution. 12 In this case, the theoceptor nds an energy difference of 4 kcal mol À1 favouring the ligand with nitrogen oriented as in the deposited structure. This preference is due to an interaction of the pyridine nitrogen with a positively charged environment formed by the sidechains of His192 and Asn137. The theoceptor provides extra support for the deposited structure.

Compound 3
The phenyl substituent in ligand 3 is in an axial position relative to the 3-hydroxycyclohexenone ring in the deposited structure. Somewhat surprisingly, axial and equatorial ligand conformations were computed to be energetically indistinguishable in the unbound state (energy difference ¼ 0.6 kcal mol À1 ). In 3 there are no axial hydrogens located on the same face of the ring as the phenyl group and consequently, no clashes. However, there is a clear preference in the bound state: the theoceptor with the ligand in an equatorial conformation is 4 kcal mol À1 higher in energy. This difference originates from protein-ligand edge to face stacking present when the ligand is in an axial conformation (Fig. 5). In this case the theoceptor conrms the surprising conformation that had been proposed for 3.

Compounds 8-10
Initial positioning of the malonate derivatives was carried out by superimposing the malonate moiety on the 3-hydroxy cyclohexenone ring of ligand 3, to place it in close proximity to the basic side chain of Arg168. The malonate moiety of compound 9 was rst modeled in its dianionic form; the resulting complexation energy (DE) of +44 kcal mol À1 is not in the range found for other ligands described here. This arises due to the loss of solvation energy that is not compensated by sufficient interactions in the receptor. These ndings suggest that the malonate moiety may bind in a mono-deprotonated form. Three different malonate positions were investigated for monoanionic 9 in the theoceptor. The lowest energy theoceptor was the one where the malonate moiety made four hydrogen bonds with the nearby sidechains (À18.9 kcal mol À1 ). The energy difference of 15 kcal mol À1 between the highest and lowest energy theoceptor emphasizes the effect subtle conformational changes of the interacting groups can have on the overall binding energy, particularly when strongly interacting groups such as anions are involved.

Malonate and oxamate
When considering the binding of malonate itself, a total of 8 different malonate tautomeric monoanions were optimized in solution and all were found to be energetically indistinguishable (within a range spanning 0.5 kcal mol À1 ). The correct orientation of malonate in the binding cavity could also not be condently determined: optimization of 8 different theoceptors, each with a different malonate starting position resulted in complexes with DE values ranging from À8 to +16 kcal mol À1 . These ndings, in conjunction with experimental ambiguity concerning the state of the protein during the experimental measurements, suggested that malonate may bind in the presence of and proximal to cofactor NAD+/NADH. 74 Calculations employed the oxidized and reduced form of nicotinamide with the ribose present but with the phosphate moiety truncated to a methyl group. 92 The DE with co-factor in its reduced form was 5.4 kcal mol À1 and in its oxidized form almost À40 kcal mol À1 , which is consistent with the experimental affinities. Finally, the binding of oxamate was investigated. Again, binding is facilitated by the co-factor: the energy difference between the theoceptor with and without co-factor bound was more than 50 kcal mol À1 (Fig. 4). These ndings are very interesting from the drug designer's point of view: small negatively charged molecules that mimic substrate and bind in the presence of co-factor (rather than competing with it) have the tendency to be more efficient binders. As these studies were underway, a publication disclosed a new inhibitor, 12, in which a nitrogen has been introduced into the central ring of compounds like 1, 3, 4 and 5. 64 This provided an opportunity to test the predictive capability of the theoceptor. The calculations for compound 12 yielded a DE of À19.5 kcal mol À1 and log P of 3.84. These gave a predicted pIC 50 of 5.41. The experimental work had measured the pIC 50 to be 5.40, which is in excellent agreement with the prediction. Furthermore, the calculations reveal that when the ligand is outside the protein, the pseudoaxial and pseudo-equatorial conformations are within 0.4 kcal mol À1 but in the binding site, the axial is preferred by 5 kcal mol À1 . As with compounds 2 and 3, this is due to edge to face interactions with tyrosine (Fig. 4). The favored structure sees the introduced NH group on the opposite side of the ligand to the NH in the sidechain of Asn137; the alternative position is computed to be 1 kcal mol À1 higher in energy. The preferred stereoisomer is therefore S-12 (the original publication does not explore the stereochemical preference for this molecule).

Conclusions
The combination of computed quantum mechanical binding energies with measured or predicted ligand log P values can provide usefully accurate predictions of host-guest and proteinligand binding energies. By assigning relative energies to proteinligand structures in which different conformations, tautomers, protonation states and stereoisomers are sampled, useful insights about the interactions between the protein and ligand are generated. This can include regions that are not observed in the experimental electronic density. Computed structures reveal the limitations of the statistics currently used during the renement of X-ray crystal structures. Our approach is sufficiently accurate to make useful predictions about the affinity for new compounds. Chemists must become proactive creators of protein-ligand crystal structures rather than passive consumers; quantum mechanics provides a useful tool to do this in an objective fashion.

Conflicts of interest
AGL is a former employee and shareholder of AstraZeneca and is a shareholder of Medchemica Limited.