Elucidating the fundamental forces in protein crystal formation: the case of crambin

This study demonstrates the feasibility of periodic all-electron hybrid density functional theory calculations in the description of protein crystals, using crambin as a test case.


Computation of crystal formation energies
The energetics of the protein crystal and of the protein-protein and protein-water interactions was decomposed in various contributions as in Ref. 13 and detailed below.
For a general crystal with proteins and water molecules in the unit cell, the formation energy can be defined as: ∆ where , and are the energies of the optimized crystal, free protein ( / ) ( / ) ( / ) and free water, respectively (the symbol after the / refers to the geometry at which the energy was evaluated). is determined by the balance between the energy gain due to the new interactions ∆ present in the crystal with respect to the isolated compounds and the energy loss due to the structural deformations occurring for and from the gas to the condensed phase. For a crystal to be thermochemically stable . A deformation-free formation energy is defined as: where is the energy of the -th isolated protein taken at the geometry of the optimized ( / ) crystal, and is the energy of the -th water molecule taken in the geometry of the ( / ) optimized crystal. Eq. 2 can be easily reformulated to include the Basis Set Superposition Error (BSSE) correction, using the same counterpoise method adopted for intermolecular complexes: 14 where " " means that the energy is computed with the addition of the ghost functions of the + ℎ surrounding molecules. In periodic systems, this is usually accomplished by including ghost functions within a chosen cutoff distance (here, 40 Å) around the molecule. The can then be defined as: = ∆ * -∆ * (4) and the final, BSSE corrected, formation energy is defined as: ∆ Eqs. 1-5 were then adapted to the different systems considered in the present work. For proteinprotein and protein-water interactions of Table 4, the energetics was studied deformation-free, that is with the structure of the components frozen at the geometry of the interacting system. For some processes of Table 3, a water box representing liquid water was taken as a reference. To study such processes, Eqs. 1-3 must be reformulated as follows: ( + ℎ/ ) -( where is the energy of an optimized liquid water box containing molecules, while and are the energies of the 3D framework of water molecules as ( found in the crystal with and without the protein ghost functions, respectively. Eqs. 4-5 remain valid. To evaluate , a water box of 84 molecules (side: 13.4 Å) was generated through the ( / ) PACKMOL utility, 15 thermalized through a 10 ps by PBE-D2/TZVP ab initio molecular dynamics simulation at 300 K with the CP2K 16 software and then B3LYP-D*/6-31G(d,p) fully optimized with CRYSTAL14. For the 172W case was scaled by a factor 172/84. ( 84 / 84 )

Computation of BSSE-corrected charge transfers
Consider the interaction between two components, and , so that: These two systems have, when isolated, total electronic charges and , corresponding to their total number of electrons. These electrons are distributed among the different atoms composing the systems: where is the total number of atoms in and is the electronic charge on the atom, that may ℎ be estimated through a Mulliken population analysis.
The interaction complex has a total electronic charge : where and are the electronic charges on and when involved in the interaction. Of * * course, due to the charge transfer associated to this interaction, and . * ≠ * ≠ This charge transfer ( ) can then be defined as: If , a net flux of electronic charge from to has occurred, while has obtained charge → > 0 if . Of course, since the total electronic charge must be conserved: The Basis Set Superposition Error (BSSE) is known to affect any calculation based on atom-centered basis functions. This error is due to the superposition, in the complex, of the functions centered on both and atoms, so that is described by a richer basis set than the and components alone. This discrepancy causes inconsistencies when comparing total energies, such as when computing interaction energies. The same problem is present when dealing with equation (4), since and come from calculations with a different number of basis functions, i.e. misses * functions centered on . When these functions are present (when computing , the effect of the * ) BSSE is a net flux of electronic charge in order to partially occupy them.
This un-physical BSSE-caused charge transfer ( ) can be evaluated by doing a calculation on and when the ghost functions of the other component are present: where the square brackets mean that only the ghost functions of the component are included.
is always a negative quantity.
The BSSE-corrected charge transfer due to the interaction ( ) is then computed by removing the spurious charge fluxes of equations (6) from the uncorrected charge transfer of equation (4): where signs are conveniently put to account for charge fluxes directions.

Bibliography
( Supporting tables Table S1. Energetics of the crambin crystal formation (HSE06-D*/6-31G(d,p)//B3LYP-D*/6-31G(d,p)). a Reaction energies as in Table 3, without including dispersion, but at the HSE06/6-31G(d,p)//B3LYP-D*/6-31G(d,p) level of theory (in the x//y notation x is the level at which the energy is computed, y is the level at which the geometry has been optimized). b As a , but obtained with dispersion (HSE06-D*/6-31G(d,p)//B3LYP-D*/6-31G(d,p)). In square brackets, the atom contribution to some formation energies. In parentheses, the contribution for each individual water molecule (when applicable) is reported. c Dispersion contribution evaluated as: E CD -E C . P: crambin molecule in gas phase. W: water molecule in gas phase. CRY: crambin crystal in the two 0W, 84W cases. All values are in kJmol -1 .   Table 3, without including dispersion, but at the B3LYP/6-31G//B3LYP-D*/6-31G(d,p) level of theory (in the x//y notation x is the level at which the energy is computed, y is the level at which the geometry has been optimized). b As a , but obtained with dispersion (B3LYP-D*/6-31G//B3LYP-D*/6-31G(d,p)). In square brackets, the atom contribution to some formation energies. In parentheses, the contribution for each individual water molecule (when applicable) is reported. c Dispersion contribution evaluated as: E CD -E C . P: crambin molecule in gas phase. W: water molecule in gas phase. CRY: crambin crystal in the two 0W, 84W cases. All values are in kJmol -1.  (-20) a Reaction energies as in Table 3, without including dispersion, but at the B3LYP/6-31G//B3LYP-D*/6-31G(d,p) level of theory (in the x//y notation x is the level at which the energy is computed, y is the level at which the geometry has been optimized). b As a , but obtained with dispersion (B3LYP-D3/6-31G//B3LYP-D*/6-31G(d,p)). In square brackets, the atom contribution to some formation energies. In parentheses, the contribution for each individual water molecule (when applicable) is reported. c Dispersion contribution evaluated as: E CD -E C . P: crambin molecule in gas phase. W: water molecule in gas phase. CRY: crambin crystal in the two 0W, 84W cases. All values are in kJmol -1 .