Open Access Article
Octav
Caldararu
a,
Rohit
Kumar
b,
Esko
Oksanen
bc,
Derek T.
Logan
b and
Ulf
Ryde
*a
aDepartment of Theoretical Chemistry, Lund University, Chemical Centre, P. O. Box 124, SE-221 00 Lund, Sweden. E-mail: Ulf.Ryde@teokem.lu.se; Fax: +46 46 2228648; Tel: +46 46 2224502
bDepartment of Biophysical Chemistry, Centre for Molecular Protein Science, Lund University, Chemical Centre, P. O. Box 124, SE-221 00 Lund, Sweden
cEuropean Spallation Source Consortium, P. O. Box 176, SE-221 00 Lund, Sweden
First published on 1st August 2019
Conformational entropies are of great interest when studying the binding of small ligands to proteins or the interaction of proteins. Unfortunately, there are no experimental methods available to measure conformational entropies of all groups in a protein. Instead, they are normally estimated from molecular dynamics (MD) simulations, although such methods show problems with convergence and correlation of motions, and depend on the accuracy of the underlying potential-energy function. Crystallographic atomic displacement parameters (also known as B-factors) are available in all crystal structures and contain information about the atomic fluctuations, which can be converted to entropies. We have studied whether B-factors can be employed to extract conformational entropies for proteins by comparing such entropies to those measured by NMR relaxation experiments or obtained from MD simulations in solution or in the crystal. Unfortunately, our results show that B-factor entropies are unreliable, because they include the movement and rotation of the entire protein, they exclude correlation of the movements and they include contributions other than the fluctuations, e.g. static disorder, as well as errors in the model and the scattering factors. We have tried to reduce the first problem by employing translation–libration–screw refinement, the second by employing a description of the correlated movement from MD simulations, and the third by studying only the change in entropy when a pair of ligands binds to the same protein, thoroughly re-refining the structures in exactly the same way and using the same set of alternative conformations. However, the experimental B-factors seem to be incompatible with fluctuations from MD simulations and the precision is too poor to give any reliable entropies.
The standard method to measure entropy changes is isothermal titration calorimetry (ITC).6 However, it only estimates the total entropy. It would strongly facilitate understanding if various contributions to the entropy could be measured, e.g. from the solvent, from the translation and rotation of the protein, or from the movement of the various groups in the protein. In particular, the latter term, the conformational entropy, has been shown to play a major role in many biomolecular interactions, in particular in protein–ligand binding, accounting for up to half of the total entropy.7–11 Conformational entropies can be estimated by solution nuclear magnetic resonance (NMR) experiments using order parameters.12–14 Unfortunately, the order parameters are typically measured only for a subset of the protein atoms, e.g. the backbone and some side-chain atoms, so the entropy description from these experiments is incomplete. Moreover, NMR order parameters report only on isolated bond vectors, thus providing no information on correlated motions.15
Therefore, reported conformational entropies have so far relied on computational methods, to fill in the incomplete experimental data.16 Molecular dynamics (MD) simulations are the preferred method of studying protein dynamics computationally and MD-based free energy calculations, for example free energy perturbation, have proved successful in calculating protein–ligand binding free energies.17,18 There are several approaches to estimate conformational entropies from MD simulations, e.g. normal-mode analysis, quasi-harmonic analysis and dihedral histogramming.19–21 However, computational approaches also have significant problems. In particular, protein simulations are based on an empirical molecular mechanics force field, with a limited accuracy. Moreover, it has been shown that the conformational entropy from MD simulations increases logarithmically even after 1 ms of simulation time, making it problematic to extract accurate estimates of the entropy.22
Consequently, there is an urgent need for an experimental method to estimate accurate conformational entropies. A possible source could be protein crystal structures. The crystallographic atomic displacement parameters, more commonly known as B-factors, are reported for all protein X-ray crystal structures and they are directly related to the atomic fluctuations (due to motion or static disorder) in the crystal.23 Polyansky et al. have suggested a method for calculating conformational entropy directly from atomic B-factors, by making use of the standard quasi-harmonic approach.24 This would provide a fast and simple method to obtain conformational entropies, based on readily available experimental data. However, there are multiple issues that arise when dealing with B-factors. First, B-factors do not take into account any correlated motions in the protein, resulting in an overestimate of the absolute conformational entropy. Polyansky et al. accounted for this by using a linear scaling of the entropies without covariance terms, resulting in a corrected entropy that was five times lower than the calculated entropy.24 Another possible way of considering correlation is by using B-factors from a translation–liberation–screw (TLS) model, in which parts of the protein are represented as a rigid body.25,26 Second, data for most protein crystal structures are collected at 100 K. This might lead to an underestimation of conformational entropies, but using room-temperature data may avoid the issue. Third and most importantly, standard B-factors do not always accurately reflect the dynamics in a protein crystal structure.27B-factors also contain a measure of static disorder, arising from differences in equivalent atomic positions in different unit cells within the crystal. In addition, even small errors in the model could significantly change the value of the B-factors.28 A previous study has suggested that X-ray refinement may underestimate B-factors by a factor of up to six.29
In this paper, we study whether crystallographic B-factors can provide accurate conformational entropies compared to those obtained by NMR relaxation experiments or by MD-based methods. To reduce the influence of static disorder and non-fluctuational contributions to the B-factors, we studied only the change in entropy between pairs of structures and re-refined the crystal structures in exactly the same way. Moreover, we employed both cryogenic and room-temperature structures. Finally, we tested using both isotropic and anisotropic B-factors, as well as TLS models and ensemble refinement to obtain entropies. Unfortunately, our results show that crystallographic B-factors are not accurate enough to be used in the calculation of conformational entropy, even after correcting for the missing correlated motions. To understand this failure, we used MD simulations in both solution and in crystals, calculating entropies with four different methods.
The corresponding room temperature data sets were collected with large crystals that were soaked for 15–18 h with the same two ligands. The crystals were then mounted in appropriately-sized loops (0.5 mm) and sealed with a plastic tube using the MicroRT kit from MiTeGen. Data was collected at BioMAX at MAX IV. A defocused beam of size 50 × 50 μm2 was used to avoid radiation damage, but this data turned out to be without ligand for S. On the other hand, positive data were instead collected with a focused beam (20 × 5 μm2). The structure for R then underwent the same refinement procedure as the cryo-temperature structures, described by Verteramo et al.38 The data for S was of a slightly lower resolution, 1.6 Å, so only isotropic B-factor refinement was possible, but all the other parts of the refinement process were similar to that for R. The structures have been deposited in the PDB (IDs 6RHL and 6RHM respectively). Refinement statistics for the final room-temperature structures are given in Table S1 (ESI†). B-Factors for the two sets of structures are compared in Fig. 2.
Next, the two sets of structures were modified, so that both pairs of complexes have alternative conformations for exactly the same residues. This was accomplished by generating extra alternative conformations of sidechains manually in Coot39 after visual inspection. A list of all residues that have alternative conformations is given in Table S2 in the ESI.† These structures were then fully re-refined for seven cycles, refining occupancies for all residues in alternative conformations and anisotropic B-factors for all non-water and non-hydrogen atoms. The B-factors after this refinement were used for entropy calculation. All refinements were done with Phenix v1.11.40 An additional refinement was also run for each protein–ligand complex with TLS parametrization, defining the whole protein and the ligand atoms as one TLS group. Selected refinement statistics for the final structures used for entropy calculation are shown in Table 1.
| R | S | |||||||
|---|---|---|---|---|---|---|---|---|
| Original | Iso | Aniso | TLS | Original | Iso | Aniso | TLS | |
| 100 K | ||||||||
| R | 0.124 | 0.140 | 0.130 | 0.132 | 0.126 | 0.142 | 0.130 | 0.138 |
| R free | 0.158 | 0.176 | 0.163 | 0.166 | 0.156 | 0.177 | 0.162 | 0.171 |
| 〈B〉 | 12.1 | 13.2 | 12.7 | 12.6 (3.5) | 14.8 | 14.7 | 14.0 | 13.9 (3.4) |
| 300 K | ||||||||
| R | 0.137 | 0.160 | 0.139 | 0.153 | 0.179 | 0.171 | 0.171 | |
| R free | 0.165 | 0.185 | 0.169 | 0.173 | 0.190 | 0.190 | 0.191 | |
| 〈B〉 | 26.8 | 27.6 | 28.0 | 29.3 (24.1) | 26.7 | 29.5 | 29.9 (25.2) | |
All the galectin-3C simulations were started from the X-ray crystal structures of R and S-galectin-3C determined at 100 K. For the normal MD simulations, each galectin-3C complex was solvated in an octahedral box of water molecules extending at least 10 Å from the protein using the tleap module, so that 4965–5593 water molecules were included in the simulations. The simulations were set up in the same way as in our previous studies of galectin-3C.22,37,38,46 All Glu and Asp residues were assumed to be negatively charged and all Lys and Arg residues positively charged, whereas the other residues were assumed to be neutral. The His158 residue was protonated on the ND1 atom, whereas the other three His residues were protonated on the NE2 atom, in accordance with neutron crystal structures, NMR measurements and previous extensive test calculations with MD.47,48 This resulted in a net charge of +4 for the protein. No counter ions were used in the simulations.
The trypsin and lysozyme simulations were started from the crystal structures described above. All the crystal waters were kept and for residues with alternative conformations, the one with the higher occupancy was used (or the first conformation if both had the same occupancy). The proteins were solvated in the same way as for galectin-3C, giving 5580 and 8864 water molecules for trypsin and lysozyme, respectively. All Glu and Asp residues were assumed to be negatively charged and all Lys and Arg residues positively charged. For trypsin, the His residues were protonated as determined in a previous study:47 His40 and His57 were doubly protonated on both the ND1 and NE2 atoms, whereas His91 was protonated only on the NE2 atom. For lysozyme, the protonation of the single His residue was decided by analysing the hydrogen bond network around the residue: His31 was protonated on the ND1 atom.
The proteins were described by the Amber ff14SB force field49 and water molecules with the TIP4P-Ewald model.50 The ligands were treated with the general Amber force field with restrained electrostatic potential charges,51 which have been described before for the galectin-3C ligands,38 whereas those for the trypsin and lysozyme ligands are listed in Table S3 (ESI†). For each complex, the structures were minimised for 10
000 steps, followed by 20 ps constant-volume equilibration and 20 ps constant-pressure equilibration, all performed with heavy non-water atoms restrained towards the starting structure with a force constant of 209 kJ mol−1 Å−2. Finally, the system was equilibrated for 2 ns without any restraints and with constant pressure, followed by 10 ns of production simulation, during which coordinates were saved every 5 or 10 ps. For each protein–ligand complex, 10 independent simulations were run, employing different solvation boxes and starting velocities.52 Consequently, the total simulation time for each complex was 100 ns. Several previous studies have indicated that entropy estimates from MD simulations converge very slowly.53,54 In fact, it has been shown that they change even after 1 ms simulations.22 Therefore, it is unlikely that the results will improve if longer simulations are employed.
All bonds involving hydrogen atoms were constrained to the equilibrium value using the SHAKE algorithm,55 allowing for a time step of 2 fs. The temperature was kept constant at 300 K using Langevin dynamics,56 with a collision frequency of 2 ps−1. The pressure was kept constant at 1 atm using a weak-coupling isotropic algorithm57 with a relaxation time of 1 ps. Long-range electrostatics were handled by particle-mesh Ewald summation58 with a fourth-order B spline interpolation and a tolerance of 10−5. The cut-off radius for Lennard-Jones interactions between atoms of neighbouring boxes was set to 8 Å.
The two galectin-3C MD simulations in crystal unit cells were set up using the Amber XtalUtilities package, with the unit cell size extracted from the CRYST1 record in the PDB files. One unit cell contained four protein monomers, resulting in four and eight protein monomers simulated for the one and two unit cells simulations, respectively. All crystal water molecules were kept in the simulations. 7Na+ and 11Cl− counter ions were added to match the 0.4 M ionic strength used in the crystallographic experiments. Water molecules were added successively to the existing crystallographic water molecules until all empty space in the unit cell was filled. As this is difficult to evaluate visually, multiple starting structures with 350, 400, 450 and 500 added water molecules per unit cell were tested in the equilibration step. The simulation containing 500 water molecules kept the volume of the system closest to the unit cell volume and was used for the production runs. The same protocol as in the normal MD simulation was used, resulting in 100 ns (10 × 10) of simulation time for each galectin-3C–ligand complex.
In the first approach, the entropies were obtained by dihedral angle histogramming (DH).12,46,59,60 Conformational entropies were calculated from the ensemble of configurations of the protein and ligands by analysing the dihedral-angle fluctuations. Cartesian coordinates were first transformed to internal coordinates. The distribution of the dihedral angles was then approximated by a discrete histogram of 72 bins of 5° each (other number of bins have been tested previously46) and the resulting entropy was calculated from:
![]() | (1) |
In the second approach, entropies were obtained from quasi-harmonic analysis (QHA) of the covariance matrix.61,62 Thus, the fluctuations were assumed to follow a multivariate Gaussian distribution and quasi-harmonic frequencies were calculated as the eigenvalues of the mass-weighted variance–covariance matrix of the atomic fluctuations determined from an MD simulation. The entropy was then estimated from these frequencies using the harmonic-oscillator approximation:
![]() | (2) |
Third, entropies were obtained from B-factors using the method suggested by Polyansky et al.24 The B-factors are directly related to atomic root-mean squared fluctuations (RMSF) according to:
![]() | (3) |
In the case of anisotropic B-factors, RMSFs for each coordinate can be obtained from the diagonal elements of the symmetric displacement tensor, for example:
| U11 = RMSFx | (4) |
To estimate the consistency of our calculations, we also estimated isotropic and anisotropic B-factors from MD simulations using cpptraj and used these to estimate entropies from the MD simulations.
Fourth, entropies were obtained using both B-factors and information from MD simulations, i.e. by combining the QHA and BF methods. The BF approach assumes that all atoms move independently of each other. To try to correct the entropies and take into account some correlated motions, we have calculated quasi-harmonic frequencies from a variance–covariance matrix in which the covariance (off-diagonal) terms were taken from an MD simulation and the variance (diagonal) terms were calculated from crystallographic B-factors as described above (eqn (3) and (4)).
Fifth, alternative conformations also provide a measure of conformational entropy in a crystal structure. Per-residue conformational entropy arising from residues existing in several alternative conformations can be estimated from:
![]() | (5) |
Finally, entropies can also be calculated from a normal-mode analysis (NMA) using an ideal-gas harmonic-oscillator approximation and employing vibrational frequencies calculated at the MM level.63 NMA calculations were performed on the full protein, in vacuum. Before calculating the harmonic frequencies from NMA, the systems were first minimized, also in vacuum, to a gradient of less than 0.001 kcal mol−1 Å−1. The calculations were performed with the nmode module of Amber 14, running in double precision.44
In summary, we used five different methods to obtain the atomic fluctuations:
(1) Traditionally-refined crystal structures with three different types of B-factor refinement: isotropic, anisotropic and TLS refinement.
(2) Crystal structures refined (using the same raw data) with ensemble refinement (which involves dihedral-space MD simulations with time-averaged restraints to the crystallographic data).
(3) A normal MD simulation in water solution, using periodic boundary conditions.
(4) A MD simulation of a single unit cell of the protein (without any crystallographic data, except the size of the unit cell).
(5) A MD simulation of two unit cells of the protein (also without any crystallographic data).
For the first two approaches, we employed two sets of crystal structures, viz. obtained at cryogenic temperature (100 K) or at room temperature (300 K). All crystal structures were first re-refined to ensure that the two structures with different ligands were treated in exactly the same way.
Based on these sets of structures, we have calculated entropies using four different methods:
(A) From the distribution of dihedral angles, using histogramming (DH).
(B) From a quasi-harmonic analysis of the fluctuation covariance matrix (QHA).
(C) From B-factors, using the QHA approach suggested by Polyansky et al.24 (BF).
(D) From a covariance matrix constructed from crystallographic B-factors as the diagonal terms and MD simulation atomic fluctuations for the off-diagonal terms (BF + MD).
The B-factors were obtained either from a refined crystal structure or from the root-mean-squared fluctuations of a MD simulation. Throughout the article, all entropies reported are as TΔS at 300 K, in units of kJ mol−1. Moreover, for the sake of simplicity we will write “entropies”, although we consider only conformational entropies. Uncertainties of entropies calculated from the MD simulations are standard errors over the 10 independent simulations (50 for DH).
38 and the difference in the conformational entropy of the protein has been estimated by NMR to 16 ± 14 kJ mol−1 (12 ± 8 kJ mol−1 for the backbone and methyl groups actually measured).38 Entropies estimated from MD simulations also give similar results, 10 ± 5 kJ mol−1
38 or 7–12 ± 6 kJ mol−1 in this study, as is discussed below. Moreover, there is no correlation between the per-residue entropy estimated from the B-factors and that obtained from NMR for the backbone N atoms (R = 0.01). One reason for this is that the B-factor of each atom includes the translational and rotational movement of the entire protein.
| DH | QHA | BF-iso | BF-aniso | TLS | BF-100 K + MD | BF-300 K + MD | ||
|---|---|---|---|---|---|---|---|---|
| a As S-galectin-3C could not be refined with anisotropic B-factors at room temperature, only the diagonal elements of the MD covariance matrix were replaced in the entropy calculation for S at room temperature. | ||||||||
| Crystallographic refinement | 100 K | 469 | 448 | −29 | ||||
| 300 K | 55 | 67 | ||||||
| Standard MD | 10 ± 5 | −3 ± 26 | 2 ± 227 | 33 ± 87 | 892 ± 156 | −271 ± 56a | ||
| Crystal MD | 1 unit cell | 12 ± 6 | −12 ± 8 | 161 ± 119 | 58 ± 42 | 224 ± 110 | −663 ± 467a | |
| 2 unit cells | 7 ± 5 | 33 ± 6 | 288 ± 122 | 130 ± 42 | 415 ± 134 | −751 ± 358a | ||
| Ensemble refinement | 100 K | −100 ± 11 | 228 ± 4 | 157 ± 560 | −134 ± 280 | 495 ± 379 | ||
| 300 K | 3 ± 19 | −484 ± 3 | 736 ± 817 | 601 ± 621 | 887 ± 602 | |||
An alternative approach to obtain B-factors is to use TLS refinement. By modelling the whole protein as a rigid body (one TLS group), TLS refinement removes the translation and rotation of the entire protein within the crystal lattice. This had a pronounced influence on both the B-factors and the corresponding entropies. For the cryo-structures, the B-factors were reduced, on average, by a factor of about four. Likewise, the entropies were reduced by a factor of ∼6. Consequently, ΔSconf was also reduced, to a more proper order of magnitude, although it is still somewhat too large and it also became negative, ΔSconf = −29 kJ mol−1.
Using B-factors from crystal structures obtained from data collected at room temperature improved the entropy estimates, as these B-factors reflect more accurately the real movements that take place in the protein. The entropy difference was reduced by a factor of 8 for isotropic B-factor refinement, ΔSconf = 55 kJ mol−1. In fact, the entropy difference decreased for 113 of the 138 residues in the protein compared to the cryo structure, showing that this is not a random cancellation effect. However, there is still no correlation between the per-residue entropy estimated from the B-factors and that obtained from NMR for the backbone N atoms (R = 0.06). Moreover, the TLS refinement did not reduce the entropy difference for this data, ΔSconf = 67 kJ mol−1 (Table 2). The B-factors after extracting the TLS components (Table 1) show that rigid body motion is not an important component of the overall motion of the protein at room temperature. Anisotropic B-factor refinement was not possible for the S complex at room temperature so no entropy estimates from anisotropic B-factors are presented.
These results suggest that the room temperature B-factors provide a more realistic picture of movements in the protein, although the entropies are still ∼5 times higher than those estimated by NMR or calculated from the MD simulations. Finally, we note that if we use the original galectin-3C cryo-temperature crystal structures without any re-refinements, we obtain entropy differences, calculated from the anisotropic B-factors (with method BF), that are twice as large as after re-refinement, ΔSconf = 1000 kJ mol−1. This shows that B-factors are very sensitive to what residues are modelled in alternative conformations.
QHA on the standard MD simulations gave a result of a similar magnitude to dihedral histogramming, albeit with a larger uncertainty, ΔSconf = −3 ± 26 kJ mol−1. Extracting isotropic B-factors from the atomic RMSF in the MD simulation and calculating the entropy with the BF method gives a difference in entropy of 2 ± 227 kJ mol−1, i.e. with an uncertainty that is ∼100 times higher, making the result essentially useless. Extracting anisotropic B-factors instead lowered the uncertainty by a factor of 2, ΔSconf = 33 ± 87 kJ mol−1, but the uncertainty is still clearly too large for the result to be useful.
On the other hand, entropies obtained from B-factors calculated from the fluctuations during the crystal MD simulation gave ∼5 times larger absolute entropies and a much larger ΔSconf = 161 ± 119 kJ mol−1, with a very large uncertainty, as was also observed for the standard MD simulations. The absolute entropies are of the same magnitude as for the crystal structure, but slightly smaller. Using anisotropic B-factors calculated from the simulation reduced ΔSconf and the uncertainty (58 ± 42 kJ mol−1). It may be argued that the prime problem of the B-factor entropies, calculated in this way, is the poor precision (neither of the two results is significantly different from that of dihedral histogramming). These B-factors do not contain any common translation and rotation of the entire protein, because the structures were overlaid to a common structure before the fluctuations were calculated. However, they also do not contain any information on correlated movements, as no covariance terms are included in the variance–covariance matrix when using isotropic B-factors. A factor of 5, as used in the previous study by Polyansky et al., does not seem enough to correct the entropy differences. In fact, by including only the few off-diagonal terms from the anisotropic displacement tensor, the entropy estimates are drastically improved.
We have also performed calculations based on a MD simulation with two unit cells, in order to check that there are no boundary artefacts when using a single unit cell. The obtained results are comparable to the one-unit-cell simulation: ΔSconf is 7 ± 5 kJ mol−1 for DH, somewhat larger for QHA (33 ± 6 kJ mol−1) and much larger when using calculated B-factors (130 ± 42 kJ mol−1 with anisotropic B-factors). The precision of the latter calculations is still very poor. In fact, the only significant difference in ΔSconf between the two MD simulations is that obtained with QHA.
Ensemble refinement simulations started from the room-temperature structures resulted in slightly better entropy estimates, although still worse than those obtained from MD. The DH entropy was similar to that from MD, but with lower precision, ΔSconf = 3 ± 19 kJ mol−1. On the other hand, the QHA entropies were much larger and ΔSconf was negative, −484 ± 3 kJ mol−1. BF entropies were still too large and had a poor precision, ΔSconf = 736 ± 817 kJ mol−1 with isotropic B-factors and 601 ± 621 with anisotropic B-factors.
In order to check this hypothesis, we selected a random snapshot from the crystal MD simulation in one unit cell and ran two crystallographic refinements, one against the cryo-temperature data and one against the room-temperature data, keeping the B-factors fixed to those calculated from that simulation. The resulting electron density maps show an abundance of positive difference density peaks for both structures (Fig. 3), suggesting that the structures from MD simulations cannot reproduce the experimental electron density. As the coordinates were minimised in the crystallographic refinement, we can assume that the difference density peaks reflect mostly the incompatibility of the calculated B-factors. This is consistent with the findings of Kuzmanic et al.29 However, correlation of the data is also a significant problem: transferring the (tri-)diagonal elements from the QHA covariance matrix from one simulation to another also significantly deteriorated the calculated entropies.
| Trypsin | Lysozyme | |
|---|---|---|
| BF-isotropic | 1287 | 685 |
| BF-anisotropic | 1385 | |
| BF-TLS | 883 | 565 |
| DH | 19 ± 12 | 21 ± 9 |
First, we studied trypsin in complex with benzamidine and benzylamine, which we will refer to by their PDB IDs, 5MNG and 5MNK. The entropy ΔSconf is always reported as ΔSconf = Sconf(5MNG) − Sconf(5MNK). Calculating the entropy from MD fluctuations gave a rather small entropy difference, 18 ± 12 kJ mol−1. This is expected, as the two complexes differ in only one amino group in the ligand (cf.Fig. 1b). In contrast, entropies calculated from the re-refined B-factors were two orders of magnitude higher, 1286 kJ mol−1 when using isotropic B-factors and 883 kJ mol−1 when using TLS refinement. These results are clearly useless and even worse than in the case of galectin-3C.
We also applied the BF method to two T4 lysozyme complexes, with N-phenylglycinonitrile and 3-methylbenzylazide, which will be referred to by their PDB IDs, 2RB2 and 2RBN. The estimated relative entropy is calculated as the difference ΔSconf = Sconf(2RBN) − Sconf(2RB2). Applying DH to the MD simulations gave an entropy difference of 21 ± 9 kJ mol−1, even though the structural differences between the ligands are larger than for the trypsin or galectin-3C complexes. Calculating the relative entropy using the BF method failed also in this case: the results show an almost 30 times higher relative entropy, 685 kJ mol−1 when using isotropic B-factors and 565 kJ mol−1 when using a TLS refinement. These results clearly show that the incompatibility between entropies obtained from crystallographic B-factors or MD simulations is not a problem specific only to galectin-3C.
First, we decided not to study absolute entropies, but rather to restrict the study to the relative conformational entropy of two systems as similar as possible, viz. the same protein bound to two similar ligands. Second, we performed a re-refinement of the original crystal structures, to ensure that they were treated in exactly the same way and that alternative conformations were the same in both structures. Thereby, we minimise differences in contributions to the B-factors that do not come from atomic fluctuations. Third, we have removed contributions from the translation and rotations of the entire protein by using TLS refinement. Fourth, we have solved new crystal structures for galectin-3C at room temperature, because NMR experiments and MD simulations were performed at this temperature. However, conformational entropy estimates from B-factors were still an order of magnitude larger than entropies obtained from NMR experiments or from MD simulations. Moreover, the results may vary by more than 100% when the refinement strategy is changed.
To further understand the failure of the B-factor entropies, we have performed a number of MD simulations under different conditions: in water solution (similar to NMR experiments), in one or two crystal unit cells or using the crystallographic raw data (ensemble refinement). Then, we used a number of different methods to calculate entropies from these MD ensembles: DH, QHA, NMA and by calculating B-factors from the atomic fluctuations and then extract entropies from these B-factors. We showed that simulations in the crystal unit cell gave results similar to those obtained from MD simulations in solution. On the other hand, results obtained from ensemble refinement seemed too unreliable. Most importantly, attempts to calculate entropies from B-factors obtained from the MD simulations were unsuccessful. Including all elements of the anisotropic displacement tensor gave better entropy estimates, but the results were still four times larger than those obtained by QHA or DH. Furthermore, the precision of these calculations was too poor for any quantitative assessment.
It seems that an important reason for the failure of the B-factor entropies is that they lack information about the correlation of the movements. Therefore, we tried to include proper correlation through the off-diagonal terms of the variance–covariance matrix from an MD simulation. Unfortunately, B-factors obtained from crystallographic refinement seem to be incompatible with atomic fluctuations from MD simulations. This was also confirmed for the MD simulations (i.e. that diagonal elements from one simulation cannot be combined with off-diagonal elements from another simulation).
This study also gives some information about the reliability of the various methods to estimate entropies from MD simulations: it seems that DH is the most stable method. It gives the same results 7–12 kJ mol−1 and precision 5–6 kJ mol−1 for all three MD simulations of the two galectin-3 complexes (in solution or in the crystal, using one or two unit cells). QHA gives an appreciably larger variation −12 to 33 kJ mol−1 and also a larger uncertainty, 6–26 kJ mol−1. As discussed above, the two methods measure partly different contributions to the entropy (DH excludes bond and angle vibrations, as well as correlation, whereas QHA treats large variations in dihedrals inappropriately). This difference is strongly enhanced when using ensemble refinement (which samples only dihedral dynamics).
Finally, we show that the discrepancy between entropies calculated via B-factors or via MD simulations is not system-specific. Comparing the entropy estimates from crystallographic B-factors and dihedral histogramming for two other pairs of protein–ligand complexes gave also poor results, with B-factor entropies being one or two orders of magnitude higher than those obtained from MD simulations. Consequently, we have to conclude that it currently is not possible to extract useful entropies from crystallographic B-factors, even if the crystal structures are re-refined or if room-temperature data is employed. Therefore, currently it is only possible to obtain total entropies from isothermal titration calorimetry or contributions from selected groups from NMR experiments, whereas a detailed atomistic interpretation of the entropies can only be gained from molecular simulations.
Footnote |
| † Electronic supplementary information (ESI) available. See DOI: 10.1039/c9cp02504a |
| This journal is © the Owner Societies 2019 |