On the Polarization of Ligands by Proteins

Although ligand-binding sites in many proteins contain a high number density of charged side chains that can polarize small organic molecules and influence binding, the magnitude of this effect has not been studied in many systems. Here, we use a quantum mechanics/molecular mechanics (QM/MM) approach in which the ligand is the QM region to compute the ligand polarization energy of 286 protein-ligand complexes from the PDBBind Core Set (release 2016). We observe that the ligand polarization energy is linearly correlated with the magnitude of the electric field acting on the ligand, the magnitude of the induced dipole moment, and the classical polarization energy. The influence of protein and cation charges on the ligand polarization diminishes with the distance and is below 2 kcal/mol at 9 $\unicode{x212B}$ and 1 kcal/mol at 12 $\unicode{x212B}$. Considering both polarization and solvation appears essential to computing negative binding energies in some crystallographic complexes. Solvation, but not polarization, is essential for achieving moderate correlation with experimental binding free energies.


Introduction
Noncovalent binding to proteins is a key mechanism by which small organic molecules (ligands) interact with biological systems. Most drugs are noncovalent inhibitors of particular targets. Signaling molecules generally bind to specific receptors. Molecules with low solubility often bind to serum albumin. Even in enzymes, noncovalent binding of substrates is a prerequisite to catalysis.
Many proteins generate a strong electrostatic potential that can influence ligand binding. To promote stable folding, globular proteins typically consist of a hydrophobic core and hydrophilic surface. Many amino acids in the latter region are charged. Indeed, in an analysis of 573 enzyme structures, Jimenez-Morales et al. 1 observed a high number density of oft-charged acidic (aspartic and glutamic acid) and basic (lysine, arginine, and histidine) amino acids in catalytic sites (18.9 ± 0.58 mol L −1 ) and other surface pockets, including ligand-binding sites (28.2 ± 0.34 mol L −1 ). For context, the number density of charges is 2.82 ± 0.03 mol L −1 in entire proteins 1 and 74.3 mol L −1 in a sodium chloride salt crystal. 2 Charged amino acid side chains generate patterns in the surrounding electrostatic potential that can have functional roles that include mediating associations with other proteins with complementary electrostatics and channeling charged enzyme substrates. 3 Within a protein, electrostatic forces can alter redox potentials, shift the pK a s of amino acid residues, 3 accelerate enzyme catalysis, 4,5 and polarize ligands. 6 The importance of ligand polarization in protein-ligand binding has been demonstrated by studies that compare results from similar models with and without polarization. Although the vast majority of current studies modeling biological macromolecules are based on fixedcharge molecular mechanics force fields, polarizable models are being actively developed. 7,8 Jiao et al. 9 demonstrated that incorporating polarization into a molecular mechanics force field was essential to accurately computing the binding free energy between trypsin and the charged ligands benzamidine and diazamidine. Quantum mechanics (QM) and mixed quantum mechanics/molecular mechanics (QM/MM) methods have also been increasingly employed in predicting the binding pose -the configuration and orientation of a ligand in a complex -and binding affinity. 10,11 Semiempirical QM methods have shown particular promise in correctly distinguishing the native (near-crystallographic) binding pose from decoy poses (non-native poses that have low docking scores) in diverse sets of protein-ligand complexes. [12][13][14][15] QM/MM methods usually couple the QM and MM regions via electrostatic embedding, in which charges from the MM region alter the Hamiltonian in the QM region. Electrostatic embedding allows the QM region (which in most protein-ligand binding studies includes the ligand and sometimes surrounding residues) to polarize in response to charges in the environment. Cho et al. 16 demonstrated the importance of embedding by evaluating the ability of multiple docking schemes to recapitulate ligand binding poses in 40 diverse complexes. They found that assigning ligand charges using a QM/MM method with electrostatic embedding was generally more successful than a gas-phase QM method without embedding. Subsequently, Kim and Cho 17 performed a more systematic assessment focusing on 40 G protein-coupled receptor crystal structures. The QM/MM method outperformed (1.115 Å average RMSD and RMSD<2 Å in 36/40 complexes) a gas-phase QM method without embedding (1.672 Å average RMSD and RMSD<2 Å in 31/40 complexes) and a fixed-charge molecular mechanics method (1.735 Å average RMSD and RMSD<2 Å in 32/40 complexes). Beyond the context of protein-ligand binding, the inclusion of the polarization energy has been shown to dramatically affect water density 18 and the structure and dynamics of solvated ions in water clusters. [19][20][21][22] Ligand polarization effects have also been isolated using a decomposition scheme pioneered by Gao and Xia 23 , which was originally applied to the polarization of solutes by aqueous solvents. In this scheme, the polarization energy of molecule I, Ξ pol I (Eq. 6), is the sum of the energy from distorting the wave function, Ξ dist I (Eq. 8), and the energy from stabilizing Coulomb interactions relative to the gas phase, Ξ stab I (Eq. 9). For three highaffinity inhibitors of human immunodeficiency virus type 1 (HIV-1) protease, Hensen et al. 6 found that the magnitude of the ligand polarization energy can be as large as one-third of the electrostatic interaction energy. Fong et al. 24 considered 6 ligands of HIV-1 protease in near-native poses and found that depending on the level of theory, the polarization energy is from 16% to 21% of the electrostatic interaction energy.
Although comparative studies and energy decomposition schemes have strongly indicated the importance of ligand polarization, the magnitude of this term and the factors contributing to the ligand polarization energy have not, to our knowledge, been investigated for many diverse systems. Here, we address this knowledge gap by calculating the ligand polarization energy for 286 protein-ligand complexes from the PDBBind Core Set (release 2016). 25 The PDBBind is a comprehensive database of complexes for which both Protein Data Bank crystal structures and binding affinity data are available. The Core set is a subset of the PDBBind with high-quality and non-redundant structures meant as a benchmark for molecular docking methods. The size and diversity of this dataset allow us to draw more general and statistically meaningful conclusions about ligand polarization than previous efforts.

Energies
We employed a QM/MM scheme in which the ligand is the QM region and other atoms are the MM region. To enable energy decomposition, the Schrödinger equation for the ligand was solved both in the gas phase and with electrostatic embedding.
In the gas phase, the Hamiltonian operatorĤ I of a molecule I is, where i and j are indices over all electrons and A and B are indices over all atoms in molecule I.p i is the momentum operator and m e is the mass of an electron. r r r i is the position of electron i, R R R A is the position of atom A, and Z A is the atomic number of atom A. r ij is the distance between electrons i and j, and R AB is the distance between atoms A and B. The ground-state energy E I of the molecule I is where Ψ I is the electronic wave function of the molecule I.
When the molecule I is placed in an embedding field for Coulomb interactions between the molecule I (QM) and the field Q I (MM) is, where F is an index over charges in the embedding field. The first summand describes electron-charge interactions and the second proton-charge interactions. The ground-state energy E I:Q I of the embedded molecule I : Q I is obtained by where Ψ I:Q I is the ground-state electronic wave function of the embedded molecule I : Q I .
The embedding field should affect the ground-state wave function of the molecule such that We will use the symbol Ξ to denote a difference between two expectation values. The the difference in the expectation ofĤ I:Q I between the gas phase and in the embedding field, and the Coulomb interaction energy between a molecule and the embedding field, such that Ξ elec In a system of N molecules, the total electronic interaction energy and its decomposition into polarization and permanent Coulomb energies are, In the Coulomb and polarization energies, E Coul , the factor of 1/2 is introduced to compensate for doubly counting the interaction energy. Like Ξ pol , Ξ dist and Ξ stab are similarly defined as sums over all molecules. In our present scheme, only one molecule, the ligand, is treated quantum mechanically. Thus Ξ elec = Ξ elec I , Ξ pol = Ξ pol I , Ξ dist = Ξ dist I , Ξ stab = Ξ stab I , and E Coul = E Coul I:Q I , where I is the ligand molecule.
Both Ψ I and Ψ I:Q I were calculated using the restricted Hartree-Fock method 26 in conjunction with the 6-311G** basis set. 27 The atomic charge of atoms A with and without the embedding field Q I = {q F }, q QM:Q I A and q QM A , respectively, were obtained by fitting to the quantum mechanical electrostatic potential (ESP) using the restrained electrostatic potential (RESP) method. 28 Fitted point charges were used to evaluate the stabilization energy, For most reported calculations, the embedding field Q I = {q F } consisted of all of the non-ligand atoms in the system. In order to evaluate the distance at which embedding field atoms affect the polarization energy, we also performed calculations in which the embedding field consists of all atoms within a cutoff parameter R cut of any ligand atom. The cutoff parameter was varied from R cut ∈ {4, 5, ..., 10, 12, .., 20}. Even when different R cut were used for determining Ψ I:Q I and q QM:Q I A , energies were evaluated using an embedding field based on all atoms in the model.
In addition to the electrostatic interaction energy, coupling between the QM and MM region also includes a van der Waals interaction energy modeled by the Lennard-Jones potential, where σ AF and AF are the Lennard-Jones parameters. Combined with E Coul , E vdW makes up the intermolecular pairwise interaction energy, For a system in solution, opposed to the gas phase, we also consider the solvation free energy. We will use W (X) to denote a solvation free energy, where X ∈ {P L, P, L} represent the complex, protein, and ligand, respectively. The solvation free energy is an integral over all the solvent degrees of freedom. In principle, there are many ways to compute this quantity.
In this paper, we used the Onufriev Bashford Case 2 (OBC2) 29 generalized Born/surface area implicit solvent model.
The total binding energy is given by (Figure 1), In the W (P L) calculation, q QM:Q I To isolate the effects of polarization, we also define a total binding energy that does not consider ligand polarization, W bind,np differs from W bind because q QM A are used for ligand partial charges in the W (P L) calculation. Ψ bind,np is the binding energy for a purely MM model.

Other Properties
We computed a number of other properties to assess whether they have a clear relationship with the polarization energy.
Motivated by the observation of a high density of acid and base side chains in enzymes, 1 we computed two quantities: the percentage of atoms in a protein that are highly charged; and the number density of highly charged atoms within 6 Å of any ligand atom. The percentage of atoms in the protein that are highly charged is defined as, where i is an index over atoms in the protein and N is the total number of atoms in the protein. This expression uses the Heaviside step function, where x is a real number. The volume of the binding site was determined by Monte Carlo integration. To perform this integration, a box was defined that includes 6 Å around the range of the ligand atoms in each dimension. Points within the box were randomly sampled from a uniform distribution and assessed for the distance to the nearest ligand atom. The site volume was estimated by the product of the box volume and the fraction of points in the box within 6 Å of a ligand atom.
We also computed a number of properties inspired by classical electrostatics. In classical electrostatics, the internal energy of a dipole moment in an electric field is the dot product of the dipole with the field. We considered two classical models: one in which the entire ligand is treated as a dipole and a second in which each atom is treated as a dipole.
If the ligand is considered as a dipole, the change in internal energy due to an induced dipole is, where µ µ µ ind L is the induced dipole moment of the ligand L and E 0 L is the electric field acting on the ligand L due to the embedding field Q L = {q F } consisting of atomic charges of the surrounding atoms. The electric field acting on the center of mass (or protons) R R R C of the ligand is, where F runs over the atomic sites in the embedding field.
The induced dipole moment of the ligand µ µ µ ind L was calculated in two ways. The first was from the expectation value of the dipole moments, whereμ is the dipole moment operator. The second was based on the molecular polarizability tensor, α α α L , and the electric field on the center of mass of the ligand, Elements of the molecular polarizability tensor (α α α L ) xy describe the susceptibility of a molecule to polarization along the x axis due to an electric field along the y axis. As in Willow et al. 31 , these tensor elements were calculated based on placing a pair of point charges of ∓1 a.u.
at R cm ± 100 Bohr along a Cartesian axis, where R cm represents the center of mass of the ligand, to create an electric field. Then (α α α L ) xy were evaluated as the ratio of the induced dipole moment due the point charges, µ µ µ ind,pc L , and the electric field applied by the point charge onto the ligand, E 0,pc L , The dipole moment from the electron density is more accurate and valuable for assessing the correspondence between Ξ pol and Ξ pol,c . However, it is not a practical shortcut to the polarization energy because it requires the same quantum chemistry calculations used to compute Ξ pol . On the other hand, although the molecular polarizability tensor, α α α L , requires three quantum chemistry calculations, it can be reused (as an approximation) for multiple ligand configurations. Hence, the dipole moment from the molecular polarizability tensor, µ µ µ ind,α L L , could potentially reduce the computational costs of Ξ pol prediction. To facilitate comparison with the polarization energy, we also computed the molecular polarizability scalar of the ligand, α L , defined as, where Tr is the trace of a square matrix.
If each atom on the ligand is considered as a dipole, then the change in internal energy due to an induced dipole is, The electric field acting on an atom is, where A runs over all atomic sites in the ligand. The induced dipole on each atom was computed based on RESP charges as, In all, we considered the relationship between Ξ pol and a number of other properties: the 1. percentage of highly charged atoms in a protein (Eq. 20); 2. molecular polarizability scalar, α L (Eq. 27); 3. Coulomb interaction energy, E Coul (Eq. 7); 4. magnitude of the electric field on the ligand center of mass, |E 0 L |, where E 0 L is from Eq.

23;
5. magnitude of total electric field on the ligand atom sites, 10. and classical polarization energy of atomic dipoles, Ξ pol,cA (Eq. 28). OpenMM 7.3.1 34 was also used to evaluate van der Waals and solvation energies, the latter with the OBC2 29 generalized Born/surface area implicit solvent model.

The distribution of polarization energy is broad and skewed
Signs of the calculated polarization energy Ξ pol , the distortion energy Ξ dist , and the stabilization energy Ξ stab are mostly as expected (Fig. 2). In nearly all of the calculations, Ξ pol < 0, Ξ dist > 0, and Ξ stab < 0. The embedding field reshapes the wave function to have stronger Coulomb interactions between the electronic probability density and point charges, such that Ξ stab < 0. Because the gas-phase wave function of the ligand has the optimal intramolecular potential, perturbing the wave function leads to a higher intramolecular potential energy such that Ξ dist > 0. In the vast majority of systems, the calculated distortion is more than compensated for by the calculated stabilization such that the calculated net effect on the interaction energy due to polarization, Ξ pol , is negative.
Exceptions to the trend of negative calculated ligand polarization energies are due to structural modeling issues that lead to short intermolecular distances. Positive clashes could be resolved by changing the models in minor ways that are equally compatible with crystallographic evidence and pKa predictions. In the 2fxs and 4f2w models, the proton on a carboxylic acid was arbitrarily placed near a ligand hydrogen instead of on the other carbonyl oxygen. In the 3u5j model, the clash could be resolved by switching the position of the terminal oxygen and amine groups, which have nearly identical electron density, on asparagine 140.
The distribution of Ξ pol , Ξ dist , and Ξ stab is broad and skewed. There is a peak in the distribution of Ξ pol around -10 kcal/mol. However, for a small number of complexes, Ξ pol is much lower, with a minimum value of -128 kcal/mol.

Systems with the lowest Ξ pol have close cations
We hypothesized that the lowest Ξ pol could be due to crystallographic cations. To test this hypothesis, we subdivided the PDBBind Core Set into two subsets: 90 complexes with cations (Na + , Mg 2+ , Ca 2+ , and Zn 2+ ) and 196 complexes without cations in the crystal structure.
Histograms of Ξ pol for the two subsets are consistent with our hypothesis (Fig. S1 in the Supporting Information). All systems in which Ξ pol < -50 kcal/mol are in the subset with cations. In contrast, the minimum Ξ pol in the subset without cations is around -40 kcal/mol.
The range of Ξ dist and Ξ stab is also much smaller in the subset without cations.
Crystallographic cations may have an outsize role in ligand polarization because the magnitude of their charge is larger than the charge of most protein atoms. In the AMBER ff14SB force field, 33 protein partial charges were determined by applying RESP 28  Beyond the presence of cations, the distance between ligand and cation atoms also plays an important role in ligand polarization (Fig. 3). Even if cations are present in a crystal structure, they are not necessarily close enough to the ligand to significantly polarize its wave function. In many systems, cations are over 10 Å from any ligand atom. In all of the complexes in which Ξ pol < -50 kcal/mol, a cation is within 4 Å of a ligand atom. The purely attractive forces draw an unrealistic amount of electron density between the ligand and cation, leading to a very negative polarization energy. For an estimate of the extent of overpolarization in several systems, see Table S1 in the Supporting Information.
As an illustrative example, there is a significant gain in the electron density between the ligand and cation in the complex 3dx1 (Fig. 4). Hence, we will proceed with extra caution in interpreting points where Ξ pol < -50 kcal/mol.
(a) (b) Figure 4: (a) The molecular structure of the ligand with one Zinc cation Zn 2+ in the complex 3dx1. Hydrogen, carbon, nitrogen, oxygen, and zinc atoms are colored with white, gray, blue, red, and green, respectively. (b) The difference in the electronic probability density is plotted. Blue and red contours illustrate the gain and loss of the electronic probability density due to the embedding field.

The importance of the embedding field size diminishes with distance
The size of the embedding field strongly affects estimates of the polarization energy (Fig.   5). Changes in the cutoff distance R cut alter the partial charges included in the embedding field, the wave function Ψ I:Q I , and then the RESP charges. Regardless of R cut , nearly every 3.4 Of computed properties, Ξ pol is most correlated with the electric field, the induced dipole moment, and the classical polarization energy We observed that a number of properties -the percentage of atoms in a protein that are highly charged, the number density of highly charged atoms, and the Coulomb interaction energy -have little or only weak correlation with the ligand polarization energy ( Figure S3 in the Supporting Information).
In contrast with the aforementioned properties, there is a much clearer relationship between the ligand polarization energy, Ξ pol , and several other properties: the magnitude of the (kcal/mol/Å) Figure 5: Dependence of the Coulomb interaction E Coul , the ligand polarization energy Ξ pol , and the distortion energy Ξ dist on the cutoff distance R cut . Here, the deviation and the gradient are defined as ∆F (R cut ) = F (R cut ) − F (∞) and G = dF (R cut )/dR cut , respectively, where F is either E or Ξ. In these violin plots, the width of the shaded area is proportional to the frequency of observations. Large blue points are placed at mean values. In the plot of ∆Ξ pol as a function of R cut , the green line is a function that was fitted to the mean values, 80.778R −2 cut + 0.177. electric field; the magnitude of the induced dipole moment of the ligand; and the classical polarization energy (Fig. 6). The linear correlation is strong with the magnitude of the electric field on the ligand center of mass, |E 0 L |, and even stronger with the magnitude of the total electric field vector active on all ligand atoms, A∈L E 0 A (Fig. 6  , and Ξ pol,cL are from Eq. 23, Eq. 24, and Eq. 22, respectively. The range of Ξ pol is either Ξ pol < 0 kcal/mol (left) or -50 kcal/mol < Ξ pol < 0 kcal/mol (right). calculation, a relatively inexpensive polarization energy estimate based on linear regression can be added to binding energy estimates. Such an approach could recapitulate some of the success of semi-empirical QM in reconstructing binding poses. [12][13][14][15] 3.5 Polarization is a substantial and variable fraction of interaction and binding energies We observe that the ligand polarization energy Ξ pol can be a substantial and highly systemdependent fraction of the interaction energy and binding energy (Fig. 7). In most systems where -50 kcal/mol < Ξ pol < 0 kcal/mol, the ratio Ξ pol /Ξ elec ranges from 0 to 0.4 (Fig. 7a).
Exceptions occur when Ξ elec is positive, leading to a negative ratio, or when it is small, leading to a ratio much larger than 1 ( Table S2 in  The histogram of Ξ pol /(E pair + Ξ pol ) is compressed compared to Ξ pol /Ξ elec , with the range with the largest density reduced to between 0 and 0.2 (Fig. 7b). Smaller ratios are due to the addition of van der Waals interactions that increase values in the denominator. The histograms of Ξ pol /(Ψ bind,np + Ξ pol ) and Ξ pol /Ψ bind is notable for a clear peak around 0.2 ( Fig. 7c & d). If all systems in the PDBBind are considered, qualitative trends are similar but there is increased density at higher ratios (Fig. S11 in the Supporting Information).
When considering the polarization energies of three HIV-protease inhibitors, Hensen et al. 6 found that Ξ pol can approach one-third of the electrostatic interaction energy. In our much larger data set, we found that Ξ pol can be a larger fraction of Ξ elec .
With the caveat that polarization could be overestimated in these cases, two examples where Ξ pol /(Ψ bind,np + Ξ pol ) is particularly large, 3dx1 and 3dx2, highlight the potentially outsized importance of Ξ pol for small ligands (Table S2 in the Supporting Information).
The limited number of contacts leads to a weaker Ψ bind,np , such that Ξ pol can play a larger role. Figure 8: The molecular structure of the ligand with two Magnesium cations Mg 2+ in the complex, 2zcq. Hydrogen, carbon, oxygen, magnesium, phosphorus, and sulfur atoms are colored with white, gray, red, pink, orange, and yellow, respectively.
The relative importance of ligand polarization in small ligands may explain the poor performance of binding free energy methods based on a fixed-charge force field in distinguishing molecules that are active and inactive against T4 lysozyme L99A. 38 In this protein, the L99A mutation forms a pocket known to bind a number of small hydrophobic compounds. Xie et al. 38 performed binding free energy calculations for a library of 141 small hydrophobic compounds whose thermal activity against T4 lysozyme L99A had been measured. Many of the compounds contained highly polarizable aromatic groups. The best-performing method in Xie et al. 38 had an area under the receiver operating characteristic curve of 0.74 (0.04) out of 1 for a perfect binary classifier. Binary classification performance could potentially be improved by incorporating the ligand polarization, as described in the current paper.

Solvation and polarization can be key drivers of native complex formation
For a number of native complexes, both polarization and solvation were required to compute negative binding energies (Fig. 9). Due to the harmonic restraint maintained during minimization, our models closely resemble their native crystal structures. In order for these protein-ligand complexes to adopt these structures, they should have a negative binding energy (presuming that binding results in entropy loss). If calculated binding energies for experimentally observed structures are positive, it suggests a structural modeling issue (e.g. protonation state) or that critical phenomena are not properly described. Intriguingly, the Coulomb interaction energy is positive in a significant fraction of these systems (Fig. 9a).
Incorporating van der Waals interactions in E pair slightly reduces the number of systems in which the interaction energy is positive (Fig. 9c). However, these pairwise terms, which are standard to molecular docking, are insufficient to accurately describe all the native complexes with a negative interaction energy. Incorporating a ligand polarization term (Fig. 9b & 9d) or solvation energy term ( Fig. 9e & g) alone is also insufficient. However, when both polarization and solvation are considered, all the native complexes have a negative binding energy ( Fig. 9f & h). Considering both polarization and solvation terms also appears to attenuate the broad range of binding energies observed in E pair , E pair + Ξ pol , and Ψ bind,np

Solvation but not polarization improves correlation with experimental binding free energies
An important goal in protein-ligand modeling is the accurate calculation of binding free energies -which quantify the strength of noncovalent association -that are consistent with experimentally observed values.
For several reasons, the computed binding energy ∆G bind is not expected to completely agree with the experimentally measured binding free energy ∆G bind for complexes in the PDBBind Core Set. These reasons include that: • The binding free energy ∆G bind is not rigorously equivalent to Ψ bind , but is actually an exponential average over the ensemble of the complex. 39,40 Using Ψ bind to model ∆G bind is an approximation that neglects entropy.
• The binding energy model is not exact. For example, the present model does not explicitly treat polarization of the free ligand by solvent, polarization of the protein by the ligand, and the solvation model does not include explicit water.
• The PDBBind is a heterogeneous data set in which experimental ∆G bind were determined by various modalities and under different experimental conditions. There may be systematic differences between measured ∆G bind that are not considered in our models.
• On a related note, experimental conditions used to obtain crystal structures and binding affinity data are different. Crystal structures have packing forces and are generally at a lower temperature.
Nonetheless, a comparison between computed interaction energies and experimental binding free energies can be informative.
While the treatment of solvation is essential, ligand polarization energies have a minimal effect on the correlation between Ψ bind and experimental ∆G bind (Fig. 10 and Fig. S14 in the Supporting Information). If solvation energies are not considered, the distribution of intermolecular pairwise potential energies E pair of the protein-ligand complexes is distributed extremely broadly from -1000 kcal/mol to 250 kcal/mol and the correlation between Ψ bind and experimental ∆G bind is negligible ( Fig. 10a&b  Adding the polarization energy to solvation energies computed without solvation has no effect on the solvation energy ( Fig. 10d and Fig. S14d in the Supporting Information).
In comparison, computing solvation energies using partial charges from the induced dipole diminishes correlation with experiment ( Fig. 10e&f and Fig. S14e&f in the Supporting Information).
In a critical assessment of a number of docking programs and scoring functions across eight different diverse proteins, Warren et al. 41 concluded that "no statistically significant relationship existed between docking scores and ligand affinity." Our data suggest that the lack of correlation stems from a poor or nonexistent treatment of solvation in the scoring functions. Perhaps due to cancellation of error, neglect of ligand polarization does not appear to be a major factor in the poor performance of docking scores.

Conclusions
Using a QM/MM approach., 6,23,42,43 we computed polarization energies Ξ pol for 286 complexes in the PDBBind Core Set. 25 The distribution of Ξ pol , Ξ dist , and Ξ stab were found to be broad and skewed. For properly prepared systems without atoms in unrealistically close contact, these terms all have the expected sign of Ξ pol < 0, Ξ dist > 0, and Ξ stab < 0. The lowest Ξ pol were observed in systems where cations are close to ligand atoms. In these systems, the extent of polarization is likely to be overestimated. The importance of including embedding field charges on Ξ pol appears to diminish, on average, as an inverse square law. There is no clear relationship between Ξ pol and the percentage of highly charged atoms in a protein and molecular polarizability scalar. There is a weak correlation between Ξ pol and the Coulomb energy E Coul . On the other hand, there is a stronger linear correlation between Ξ pol and the magnitude of the electric field, the magnitude of the induced dipole moment, and the classical polarization energy. The ligand polarization energy Ξ pol is observed to a substantial and system-dependent fraction of the electronic interaction energy and the total interaction energy. In some systems, consideration of ligand polarization and solvation are both essential for calculating negative interaction energies for crystallographic complexes. While consideration of solvation is essential for achieving moderate correlation between interaction energies and experiment, we did not observe that the ligand polarization energy Ξ pol improves the correlation between the binding energy and experimental binding free energies.

Acknowledgement
We thank Pengyu Ren for the suggestion to compare polarization energies with molecular polarizability.
We thank OpenEye Scientific Software, Inc. for providing academic licenses to their software. This research was supported by the National Institutes of Health (R01GM127712).

Supporting Information Available
The following files are available free of charge.
• Fig. S1. Histograms of the ligand polarization, distortion, and stabilization energies in the PDBBind Core Set for systems with and without cations.
• Fig. S2. Normalized histogram of partial atomic charges for protein atoms in the data set.
• Fig. S3. The polarization energy Ξ pol as a function of the percentage of charged atoms, number density of highly charged atoms, and Coulomb energy, E Coul .
• Fig. S4. Polarizability of the ligand (α L ) versus the the number of ligand electrons (N elec ) in ligands from the protein-ligand complexes.
• Fig. S5. The ligand polarization energy, Ξ pol , as a function of the magnitude of the electric field.
• Fig. S7. The ligand polarization energy, Ξ pol , as a function of the magnitude of the induced dipole moment.
• Fig. S8 The ligand polarization energy, Ξ pol , as a function of the classical polarization energy.
• Fig. S9. The structure of the complex 3tsk of human matrix metalloprotease-12 (MMP12) in complex with L-glutamate motif inhibitor.
• Fig. S10. Schematic of protein-ligand complexes in which the ligand center of mass is inside the ligand or the protein.
• Fig. S11. Histograms of ratio of the polarization energy of the ligand to intermolecular potential energies and binding energies in all complexes where Ξ pol < 0 kcal/mol.
• Fig. S13. The structure of the complex 1u1b of bovine pancreatic Ribonuclease A with a ligand.
• Fig. S14. Comparison of interaction energies to experimentally measured binding free energies for all complexes where Ξ pol < 0 kcal/mol.
• Tab. S1. Dependence of the ligand polarization energy on the QM region.

Supporting Information: Correlations
The percentage of atoms in a protein that are highly charged does not appear to be a significant factor in the ligand polarization energy (Fig. S3). In all the systems, only a small percentage of atoms (less than 8%) have atomic charges of |q| ≥ 0.6.
The number density of charged atoms is more related with the ligand polarization energy.
However, we only observed weak correlation, with Pearson's R being 0.32 for all complexes and 0.33 for complexes for which Ξ pol > −50 kcal/mol (Fig. S3).
It would be reasonable to think that the polarization energy is related to the Coulomb interaction energy. However, the correlation is also weak, with Pearson's R being 0.29 for all complexes and 0.25 for complexes for which Ξ pol > −50 kcal/mol (Fig. S3). In contrast with the aforementioned properties, there is a much clearer relationship between the ligand polarization energy, Ξ pol , and the magnitude of the electric field (Fig. S5).
The linear correlation is strong with the magnitude of the electric field on the ligand center of mass, |E 0 L |, and even stronger with the magnitude of the total electric field vector active on all ligand atoms, A∈L E 0 A . Intriguingly, in both cases, there appear to be two distinct trends relating the electric field to the magnitude of the electric field; a linear correlation exists in systems where Ξ pol < -50 kcal/mol, but the slope is distinct from in systems where -50 kcal/mol < Ξ pol < 0 kcal/mol. The two measures of the electric field are also correlated with each other, with a Pearson's R of 0.54 (Fig. S6). In general, the magnitude of A∈L E 0 A is greater than the magnitude of Ξ pol , suggesting that electric field vectors on individual atoms generally point in a similar direction. is from Eq. 24 (bottom). The range of Ξ pol is either Ξ pol < 0 kcal/mol (left) or -50 kcal/mol < Ξ pol < 0 kcal/mol (right).
both the magnitude of the electric field and the induced dipole, there is also a clear correspondence between the ligand polarization energy Ξ pol and the classical polarization energy Ξ pol,cL (Fig. S8). The clear correlation between the two quantities suggests that the classical model of the ligand as a single dipole in an electric field is a reasonable explanation for the quantum behavior. In contrast, the correlation between Ξ pol and Ξ pol,cA is much weaker, which indicates that the classical model of the ligand as a set of atom-centered dipoles is a poor description of the quantum phenomenon. The correlation is stronger between Ξ pol and Ξ pol,cL than between Ξ pol and Ξ pol,cL,α L because the molecular polarizability model does not perfectly capture the induced dipole moment (Fig. S6). However, in nearly 8% of complexes, |E ind,α L Lµ µ µ L | is much larger than |µ µ µ ind,QM L |. In many of these cases, such as 3tsk (Fig. S9), the ligand center of mass is within the protein (Fig. S10).
Because the ligand center of mass is within the protein, it is very close to embedding field charges and the magnitude of the electric field is particularly strong.     Figure S14: Comparison of interaction energies to experimentally measured binding free energies for all complexes where Ξ pol < 0 kcal/mol. Interaction energies are the (a) the intermolecular pairwise potential energy (E pair = E vdW + E Coul ); (b) the intermolecular pairwise potential energy with the polarization energy of the ligand (E pair + Ξ pol ) in the gas phase; the binding energy (c) without considering ligand polarization at all, Ψ bind,np , and (d) considering ligand polarization for electrostatic interactions but not in the solvation free energy, Ψ bind,np + Ξ pol , (e) considering ligand polarization in the solvation free energy but not for electrostatic interactions, Ψ bind − Ξ pol , or (f) considering ligand polarization both in the electrostatic interactions and the solvation free energy.

Supporting Information: Estimated Overpolarization
When cations are close to ligands, the extent of ligand polarizationÂăis likely overestimated by the QM/MM scheme used in this paper. To assess the extent of overpolarization, we performed some calculations in which cations were included in the QM region. In this modified scheme, there is no direct way to isolate the ligand polarization energy from energy of the complex. Instead, we estimated the induced dipole of the ligand based on RESP atomic charges, The ligand polarization energy is then computed by, where E 0 L is the electric field acting on the center of mass of the ligand.
In the selected systems where cations are very close to ligand atoms, the ligand polarization energy estimated with the main QM/MM scheme in this paper is likely too low (Table   S1). When the only ligand is in the QM region, the estimated ligand polarization energy is fairly consistent; Ξ pol (L) ∼ Ξ pol,cL (L) ∼ Ξ pol,RESP (L). When the QM region is expanded to include cations, the ligand polarization energy based on RESP atomic charges is significantly higher. For 3dx1 and 3dx2, it is about 20 kcal/mol higher. For 2zcq, where Ξ pol is especially low, Ξ pol,RESP (LC) has the opposite sign! Table S1: Dependence of the ligand polarization energy on the QM region. Ξ pol (X), Ξ pol,cL (X), and E pol,RESP (X) are from Eqs. 6, 22, and S2, respectively, with the QM region of X. Here, either the ligand (L) or the ligand and cations (LC) are included in the QM region. The unit of the polarization energy is in kcal/mol.