Calculating the absolute binding free energy of the insulin dimer in an explicit solvent

Insulin is a significant hormone in the regulation of glucose level in the blood. Its monomers bind to each other to form dimers or hexamers through a complex process. To study the binding of the insulin dimer, we first calculate its absolute binding free energy by the steered molecular dynamics method and the confinement method based on a fictitious thermodynamic cycle. After considering some special correction terms, the final calculated binding free energy at 298 K is −8.97 ± 1.41 kcal mol−1, which is close to the experimental value of −7.2 ± 0.8 kcal mol−1. Furthermore, we discuss the important residue–residue interactions between the insulin monomers, including hydrophobic interactions, π–π interactions and hydrogen bond interactions. The analysis reveals five key residues, VlaB12, TyrB16, PheB24, PheB25, and TyrB26, for the dimerization of the insulin. We also perform MM-PBSA calculations for the wild-type dimer and some mutants and study the roles of the key residues by the change of the binding energy of the insulin dimer.


Introduction
Insulin is a crucial hormone which regulates glucose level in blood. Usually it is stored in the pancreas as a hexamer, and it only shows the biological activity in a monomeric form. 1,2 The whole process follows three steps in the physiological conditions. First, an insulin hexamer dissociates into dimers and the zinc ions. And then each dimer further dissociates into the monomers. At last the monomer interacts with its receptor. 1 Nowadays, there remain many challenges in the study of insulin. 2,3 For example, how to gure out the process by which one insulin monomer binds to another and how to reduce the affinity between them. For the rst problem, despite many studies having been done in this eld, 4-8 the real binding mechanism of the insulin monomers is still unclear. 9 In order to solve the second problem, some fast-acting insulin analogues with low affinities have been exploited, such as lispro insulin (Lys B28 , Pro B29 ), 10 and insulin glulisine (Lys B3 , Glu B29 ). 11 Determination of the binding affinity of a complex in an experiment is difficult because of time and money. Compared to the experiment, molecular simulation on the computer has an advantage of low cost. It becomes an important assistant tool at present. Based on the molecular simulations, some methods have been developed to predict the affinities of the complexes, such as free energy perturbation molecular dynamics (FEP/ MD), 12 Poisson-Boltzmann surface area (MM-PBSA), 13 linear interaction energy (LIE), 14 adaptively biased molecular dynamics (ABMD), 15 steered molecule dynamics (SMD) 16 and so on. Among them, SMD may be a promising tool in drug design, 17 because it is able to produce lots of short and nonequilibrium trajectories. Now there are two ways to compare the binding affinities of a complex by the SMD method. The rst way uses the maximum force (rupture force) on the pulling collective variable to compare the affinities. [18][19][20][21][22] For example, Mai et al. study two ligands binding to inuenza virus and nd that there is a strong relationship between the maximum force and the experimental free energy. The correlation coefficient is about 0.97. 19 However, a recent study also indicates that there might be at least two different relationships (correlation coefficients) between the maximum force and the experimental value. 23 For a specic complex, the real correlation depends on the predominant interaction type between the receptor and the ligand, like the electrostatic interaction, hydrophobic interaction or H-bonds. It makes the comparison work much complicated.
The second way of comparison is calculating the free energy prole. 16 It is based on an equality proposed by Jarzynski that connects the external work and the free energy differences. 24 With the equality, one can obtain the free energy prole and the related binding free energy of a complex by lots of nonequilibrium trajectories. The whole calculation is quite suitable for parallelization. Up to now, there are numerous successful applications, like two ligands binding to FKBP protein 25 and numerous ligands binding to protein FKBP, trypsin and cyclin-dependent kinase 2. 26 These studies present that the structural exibility and the steered speed have large impact on the calculations of the free energies. To obtain accurate free energy, the low speed should be used in SMD simulation. Moreover, to improve the calculation accuracy, a lot of practical methods have been proposed, including the uctuation-dissipation theorem of Brownian dynamics (BD-FDT), 27 adaptive SMD, 28 hybrid SMD, 29,30 and the nonequilibrium friction correction method. 31 A few years ago, Tyka and co-workers proposed a connement method to calculate the free energy differences between two different stable states of a single molecule. 32 Suppose these two stable states are P and Q, respectively. The free energy difference between the two states is DF PQ . Based on a thermodynamic cycle, the calculation of DF PQ is divided into three independent stages. Two of them are the calculations of the free energy differences, DF PP* and DF QQ* . They correspond to the transitions from the exible state, P and Q, to the restrained state, P* and Q*, respectively (method: thermodynamic integration (TI) 33 ). The last stage is the calculation of the free energy difference between the two restrained states P* and Q* (method: normal mode analysis (NMA) 34 ). The connement method has been shown to be practical for an isolated molecule. Recently, Perthold and Oostenbrink also construct a thermodynamic cycle to calculate the binding free energies of some protein-protein complexes with the perturbed distance restraints. 35 Their free energy results are close to the experimental values.
In this work, we borrow the ideas of the above papers into the calculation of the absolute binding free energy of an insulin dimer system. Since TI and NMA are very time-consuming for large systems, we replace them by SMD. In our work, one SMD simulation is performed in a virtual binding process of the two insulin monomers. To remove the effect of the restraints from the results, we also perform two additional SMD simulations to produce the free energy differences between the restrained and unrestrained states. Just as the work in ref. 35, the three processes constitute a ctitious but complete thermodynamic cycle from the unbound state to the bound state for the insulin monomers. The nal result in our work is À8.97 AE 1.41 kcal mol À1 , which is close to the experimental value À7.2 AE 0.8 kcal mol À1 . 36 Besides the calculation of the absolute binding energy by SMD, we also investigate the binding mechanisms of the insulin dimer. A set of simulations are performed for the dimer at different distances. The results show the importance of the p-p interaction, hydrophobic interaction and the hydrogen bond interaction in the dimerization process. Finally, we perform the MM-PBSA calculations 13 to study how the key residues affect the binding energies of the dimer.

Structural information
The conformation of the insulin dimer is shown in Fig. 1(b). Its monomeric structure consists of two chains ( Fig. 1(a)). Chain A in red color has 21 residues and chain B in green color has 30 residues. They are associated with each other by two disulde bonds (Cys A7 -Cys B7 , Cys A20 -Cys B19 ). Two monomers bind together to form a dimer ( Fig. 1(b)). On the interaction surface between the monomers, there is a hydrophobic core including two antiparallel b-sheets (B24-B26) and two a-helixes (B9-B18). And the b-sheets are stabilized by four hydrogen bonds ( Fig. 1(b)). Many experiments have been done to investigate the structure and dynamics of insulin, such as atomic force microscopy (AFM) experiment, 37 alanine scanning experiment, 38 amide I two-dimensional infrared spectroscopy (2D IR) experiment, 39 laser induced temperature jump (T-jump) experiment, 40 time-resolved X-ray scattering experiment 41 and so on. Moreover, at 298 K, concentration difference spectroscopy experiment presents that the insulin dimer's binding free energy is À7.2 AE 0.8 kcal mol À1 (ref. 36) and the isothermal titration microcalorimetry (ITC) dilution experiment gives its dissociation free energy 6.88 AE 0.03 kcal mol À1 . 42 These studies suggest that both of the insulin dimerization and dissociation are enthalpy control. As to the simulations, the dynamic properties of the insulin monomer, 43,44 dimer 45-50 and hexamer 51,52 in water are also investigated by some groups. For instance, two distinct surfaces (the dimer forming surface and the hexamer forming surface) on an insulin monomer are characterized in ref. 44. The role of the conned water molecules to the stability of the insulin hexamer is studied in ref. 52.

Thermodynamic cycle
Since insulin tends to exist as dimers or hexamers in aqueous solvent, the monomeric structure of the insulin is still unknown in experiment. In an early NMR experimental study, the monomeric structure of an insulin variant des-[Phe B25 ] (insulin without the Phe B25 residue) was determined. 56 It does not show a large structural difference to that in the dimer of the wild insulin. To verify the structural stability, we extract the monomer from the crystal structure of the insulin dimer and perform four independent 100 ns equilibrium simulations. The uctuations of the RMSD of C a atoms are about 2.0 A (the rst two and the last two residues are not included). It proves again that the monomeric structure from the dimer is relatively stable in aqueous solvent.
In 1985, concentration difference spectroscopy experiment obtained the thermodynamic quantities of porcine insulin, including the changes of enthalpy, entropy and free energy in the binding process. 36 The experimental values can be used as reference but the process is too complicated to be simulated on computer, so does the calculation of the absolute binding free energy. In current work, inspired by the connement method 32 and the work by Perthold et al., 35 we also use a thermodynamic cycle to circumvent the issue. The schematic diagram of the cycle is illustrated in Fig. 2.
In the gure, the whole thermodynamic cycle consists of four independent processes. Process D at bottom is corresponding to the real binding process from the unbound exible monomers to the dimer. Process A, B and C are the virtual processes to close the thermodynamic cycle. Among them, process A at le is a virtual relaxing process from the restrained unbound state to the exible unbound state. In the restrained unbound state, eleven geometric restraints are applied to the system (see the details below). And in the exible unbound state, all the restraints are all removed from the system. Process C at right is similar to process A except that the SMD simulation is only performed for the bound state. As to process B at top, it is a virtual binding process from the unbound state to the bound state with a xed force constant in all the restraint potentials. It must be noted that the simulation of the virtual binding process B is much easier than the real binding process D. This is due to the decrease of the degrees of freedom in the system. Based on the thermodynamic cycle, we divide the calculation of the absolute binding free energy of insulin dimer DF D into three parts, DF A , DF B and DF C . Each part represents a free energy difference in the corresponding process. All the simulations are performed by SMD. 16 Finally, the absolute binding free energy of insulin dimer is obtained by the following formula:

Standard-state correction and rotational restriction correction
In this work, we also involve the standard-state correction DF V and the rotation restriction correction DF R . DF V gives the freeenergy difference when one molecule transfers from the available volume in the unbound state V u to the standard-state volume V 0 (1661 A 3 ). 35,57 Here Dr x,u , Dr y,u and Dr z,u are the differences between the minimum and maximum monomer-monomer distances in the x, y and z directions at the unbound state with restraints, respectively. The rotational restriction correction term DF R is introduced to account for the effect of the restraints in the bound state, which is given in the following formula: 35 Here Dr y,b and Dr z,b are the maximum deviations of the distances in the y and z directions at the bound state with restraints, and r b is the average radial monomer-monomer distance in the bound state without restraints. As a consequence, the absolute binding free energy of the insulin dimer becomes We want to note that the reaction coordinate in the SMD simulation for the virtual binding process B is the distance between the monomers along a xed X-axis. The free energy does not need the Jacobian correction. 58

Free energy calculation
Here we give a brief introduction of the existing SMD method. Thermodynamics theory says that when a system changes innitely slow along a path from its state X to state Y, the Helmholtz free energy difference between the states equals the external work W performed on this system: where F X , F Y is the free energy of state X and state Y, respectively. This equation only works for an extremely slow process. As to a transition process with a nite rate, Jarzynski proposes a new equality: 24 Here b is the inverse temperature 1/k B T, and the bracket indicates an ensemble average. This equality allows us to obtain the free energy difference from lots of non-equilibrium short trajectories. To force the system to move along a fast transition path, some kind of restraint potential should be applied. The potential may have many different forms, 59 such as that with the constant forces or torques. But the most widely used one is the spring potential. It has the following form: where r is the positions of the atoms in a 3N dimensional space.
x(r) is the collective variable to be restrained on the path. In this work, it is set as the distance of center-of-mass of the insulin monomers at X-axis for process A, B and C. k is the force constant of the restraint potential. The center of the spring potential, l, moves according to following formula: l 0 is the initial position of the system on the path, t is the simulation time and v is the constant moving velocity. With such spring potential, the new Hamiltonian of the restrained system is Here H 0 is the Hamiltonian of the original system and p is the momenta of the atoms. According to the Jarzynski's equality, the free energy on the path is given as follows: W 0/t is the external work done on the system when it moves from l 0 to l t on the path. The work can be calculated by an integral on time from 0 to t: Substitute the restrained Hamiltonian into the formula, the external work becomes:

Denition of the restraint potentials
Besides the distance of center-of-mass, there are ten more restraints on the two insulin monomers. These restraints are used to ensure the monomers to form a correct complex conformation in the SMD simulation. The rst two restraints are the heavy atoms RMSD restraints. They restrain the structures of the individual monomers in the virtual binding process. The third and fourth restraints are the center-of-mass (C a atoms) restraint for each monomer, which is used to ensure the monomers move along the X-axis direction (binding direction).
The restraint potential has the form Here Dy and Dz are the deviations of the center-of-mass of the monomers to the X-axis in the y or z direction, respectively. The form of the restraint potential does not change with time.
And it does not have the steered coordinate in the SMD calculation (center-of-mass distance between the monomers on the X-axis).
The rest are six restraints on six different atoms of the monomers. They are used to avoid the rotation of the monomers. As shown in Fig. 3, two red atoms (OE1 of Gln A15 in monomer I and CD1 of Leu A16 in monomer II) are xed on the red horizontal dotted line. Two purple atoms (ND1 of Hid B5 in monomer I and CG1 of Val B2 in monomer II) are xed on the purple vertical X-Y plane with a xed z coordinate. Two blue atoms (N of Lys B29 in monomer I and CA of Thr A8 in monomer II) are xed on the blue horizontal X-Z plane with a xed y coordinate. All these restraints are the harmonic potentials with a force constant of 50 kcal mol À1ÅÀ2 .

Simulation details
The initial structure of insulin dimer comes from the Protein Data Bank 53 (PDB ID: 4INS 54 ). First, the complex is rotated to be aligned with X-axis (red solid arrow in Fig. 3). Then it is solvated in a periodic box of 88.6 A Â 55.5 A Â 55.3 A with 6498 TIP3P water molecules. 60 Na + and Cl À ion concentration is 12 mM, which is similar to experiments. 36 Second, the solvated system is minimized for 10 000 steps by the steepest descent method and 10 000 steps by the conjugate gradient method. Third, a 20 ns equilibrium simulation is performed to relax the system. In the simulation, the center-of-mass of each monomer is restrained to X-axis. Finally, the distance between the monomers is Fig. 3 Six single-atom restraints in the SMD simulation for the insulin dimer. The red, purple and blue solid arrows represent the X, Y and Z axis, respectively. Two red atoms are fixed at the horizontal red dotted line, two purple atoms are fixed on the vertical X-Y plane with a fixed z coordinate and two blue atoms are fixed on the horizontal X-Z plane with a fixed y coordinate. The figure is produced by VMD. 55 This journal is © The Royal Society of Chemistry 2020 RSC Adv., 2020, 10, 790-800 | 793 gradually increased to 36.5 A with the restraints mentioned above. The nal unbound conformation of the system is shown in Fig. 3.
In this work, the force eld is ff14SB AMBER force eld 61 and the time step is 2.0 fs. SHAKE algorithm 62 is used to constrain the bonds involving the hydrogens. The non-bonded interaction is calculated with a cut-off of 10.0 A. All the SMD simulations are performed by AMBER 18 (ref. 63) in the NVT ensemble at 298 K.
In the virtual binding process B in Fig. 2, the center-of-mass distance between the insulin monomers decreases from 36.5 A to 17.5 A with a low speed of 0.2 A ns À1 . All the restraints are kept in the simulation. The simulation time of the process lasts for 95 ns. Process A and C are also simulated by SMD but with a little difference. In these two processes, the driven coordinate is the force constant k in the restraint potential that changes from a restrained state (k ¼ 50 kcal mol À1ÅÀ2 ) to a exible state (k ¼ 0 kcal mol À1ÅÀ2 ). As a comparison, a standard SMD uses the distance as the driven coordinate. The simulation time of process B and C is 20 ns. To obtain the average free energy difference, we perform 40 independent simulations for every process.
To study the residue-residue interactions in the dimerization of the insulin (hydrophobic interaction, p-p interaction and hydrogen bond interaction), we run ve simulations at a center-of-mass distance of 26 A, 24 A, 22 A, 20 A and 18 A (native state). Each simulation consists of eight independent 100 ns trajectories, whose initial structure is extracted from process B and further equilibrated for 20 ns. To x the distance between the monomers, an umbrella restraint potential U is applied to the system in the simulation.
Here R is the center-of-mass distance between the monomers. k is 50 kcal mol À1ÅÀ2 . r 1 , r 2 , r 3 and r 4 determine the region. For instance, in the simulation at a distance of 26.0 A, r 1 , r 2 , r 3 and r 4 are set to 25.5 A, 25.6 A, 26.4 A and 26.5 A, respectively. We also perform MM-PBSA calculations to get the change of the binding energy between the wild-type dimer and the mutants. To get the mutants, we remove all the sidechain atoms of the selected residues except the C b atom. And then change their names to 'ALA' in the pdb le. The missing hydrogen atoms are added automatically by the tleap tool in AMBER 18. 63 In all the MM-PBSA calculations, we use the three-trajectory mode and run 27 trajectories for each mutant. For each trajectory, the simulation time is 100 ns and 25 snapshots are extracted evenly from the last 50 ns. So, in total we have 225 snapshots in the MM-PBSA calculation. The calculation is performed by the MMPBSA.py script in AMBER 18. 63 The dielectric constants of the solute and the solvent are 1.0 and 80.0, respectively. The nonpolar part of the solvation energy is estimated by:

Results and discussions
As shown in the thermodynamics cycle in Fig. 2, the calculation of the binding free energy of the insulin dimer is completed by three independent processes. The main process is the virtual binding process from the restrained unbound state to the restrained bound state (process B). And the other two are the virtual relaxing processes at the unbound and bound state, respectively (process A and C). For each process, we perform a repeated SMD simulation 16 to generate 40 trajectories. These trajectories are used for the calculation of the free energies (eqn (10)) and the corresponding errors. The free energy proles of the three processes are shown in Fig. 4. Panel (b) is the data of the virtual binding process B. In the process, the center-of-mass distance between the monomers decreases from 36.5 A to 17.5 A. And the least free energy DF B is À25.02 AE 0.95 kcal mol À1 at distance 18.2 A. Panel (a) and (c) are the data of the two virtual relaxing processes A and C at the unbound state (distance 36.5 A) and bound state (distance 18.2 A), respectively. In these two processes, the force constants in all the restraint potentials decrease from 50.0 kcal mol À1 to 0 kcal mol À1 gradually. So, at the end of the processes, the molecule turns into a exible structure completely. The end-toend free energy differences of the two processes are À77.88 AE 0.89 kcal mol À1 and À61.66 AE 0.53 kcal mol À1 , respectively. As expecting, all the free energy errors increase when the molecule moves along the path.
In order to have the absolute binding free energy, the standard-state correction and the rotational restriction correction are both considered in our work. The standard-state correction (eqn (2)) is 5.45 kcal mol À1 and the rotational restriction correction (eqn (3)) is À5.62 kcal mol À1 . Combining all these data together, the nal absolute binding free energy of the insulin dimer is À8.97 AE 1.41 kcal mol À1 (eqn (4)). In experimental aspect, the pig insulin dimer has been studied by the concentration difference spectroscopy. Strazza et al. measure the equilibrium constant K a at ve uniformly-spaced temperatures from 394 K to 311 K. 36 At 298 K, the corresponding binding free energy is À7.2 AE 0.8 kcal mol À1 . 36 It shows that our calculation result is close to the experimental data.
In a previous study, Zoete et al. calculate the absolute binding free energy of the insulin dimer by the MM-GBSA method. 46 In this method, the binding free energy is constituted of an enthalpy term and an entropy term. The former includes the difference of the gas phase energy and the solvation free energy between the unbound and the bound state. And the latter contains the difference of the transitional, rotational and vibrational entropies. The MM-GBSA result is À11.91 AE 6.7 kcal mol À1 , 46 which is comparable to the experimental value. 36 But, its enthalpy term and the entropy term are À38.65 AE 5.8 kcal mol À1 and À26.74 AE 3.6 kcal mol À1 , respectively. 46 The experimental enthalpy and entropy (TDS) is À10.0 kcal mol À1 and À3.3 kcal mol À1 respectively, which are obtained approximately by van't Hoff relation. These two large quantities are greatly different to that in experiment. This may be because the MM-GBSA method includes many parameters and approximations. 13,34,64,65 For the enthalpy term, the error might come from the implicit solvent model, 13,66 and for the entropy term, the error might come from the harmonic approximation. 34 And there are some improved methods aim to decrease the errors in the MM-GBSA or MM-PBSA calculation. [67][68][69][70] Compared to MM-GBSA, SMD is quite simple. Its calculation is based on work W in eqn (6), instead of the separated enthalpy and entropy terms. But SMD do have its own parameters, such as the steered speed and the number of the trajectories.
Previous studies have shown that both slowing down the steered speed and increasing the number of the trajectories can improve the accuracies of the results. [25][26][27] However, preparation of a proper restraint is still difficult in SMD.
One more thing we want to note is that SMD completes the calculation in the explicit solvent. And MM-GBSA or MM-PBSA are not. In MM-GBSA or MM-PBSA, the snapshots are extracted from the explicit solvent, but the free energy calculation is done in implicit solvent. This may bring the underlying problem to the calculation of the absolute binding free energy. For example, determination of the reference solute dielectric constant value is not easy. It can be 1.0, 2.0 or 4.0 in different MM-PBSA calculations. [71][72][73][74] Insulin is an essential protein in treating the disease. Understanding the binding mechanism of insulin dimer will be useful to design drug. Fig. 5 shows the change of the average solvent-accessible-surface area (SASA) in the binding process of the insulin monomers. The data come from ve independent MD simulations at the center-of-mass (COM) distance of 26 A, 24 A, 22 A, 20 A and 18 A (native state), respectively.
The change of the total SASA in the binding process (COM distance from 26 A to 18 A) mainly comes from 22 residues on the binding interface (B9-B30). We present eight residues that have the largest SASA differences in Table 1. Their SASA changes are plotted in Fig. 6. Among these residues, three are hydrophobic residues (Val B12 , Phe B24 , Phe B25 ), four are polar residues (Ser B9 , Tyr B16 , Tyr B26 , Pro B28 ) and one is the charged residues (Glu B13 ). According to the denition of the hydrophobic interaction, the SASA reductions of the three hydrophobic residues (Val B12 , Phe B24 , Phe B25 ) should favor the binding of the monomers. These residues are important to the insulin dimerization.  Moreover, Phe B24 and Phe B25 are on the exible b-strand, their hydrophobic interactions also favor the formation of the b-sheet between the monomers. p-p interaction between the aromatic amino acids is a typical non-covalent intermolecular force. Recently, Zhao et al. investigate lots of pairs of aromatic amino acids and nd some statistical laws. 75 In their denition, p-p interaction satises some geometry conditions. First, the center-of-mass distance between two aromatic rings should be less than 7.2 A. Second, the normal-normal acute angle should be larger than 50 (Tshaped conformation) or less than 30 (parallel conformation). In this work, we use the denition to study the p-p interactions between insulin monomers. Fig. 6 The changes of the SASA of the eight residues in Table 1 from the isolated monomers (distance 26 A) to the complex (distance 18 A).  There are four aromatic amino acids (Tyr B16 , Phe B24 , Phe B25 and Tyr B26 ) for each monomer on the binding interface. In the binding process, the percent of the p-p interactions between four strongest aromatic pairs are shown in Fig. 7(a) (Tyr B16 _-Tyr B26 , Phe B24 _Phe B24 , Phe B24 _Tyr B26 and Phe B25 _Phe B25 ). From the gure, we nd that the rst three p-p interactions increase quickly in the binding process. Besides the hydrophobic interaction, the inter-strand hydrogen bond interaction is also important to the stabilization of the complex. Fig. 7(b) gives the change of the percent of four inter-strand hydrogen bonds at different distances between the monomers. The locations of these bonds in the complex are shown in Fig. 1. It presents that all the four inter-strand hydrogen bonds become stable at 20Å (percent larger than 50%). They make the two exible b-strands form into one global b-sheet.
In a word, the dimerization of the insulin is favored by the hydrophobic interaction, p-p interaction and hydrogen bond interaction. Among the 22 residues on the binding interface, ve key residues Val B12 , Tyr B16 , Phe B24 , Phe B25 , and Tyr B26 are picked out according their involvement in the interactions. The locations of the key residues in the dimer are shown in Fig. 8. These ndings are close to previous alanine scanning experiment. 38 The experiment studies some mutants of the insulin dimer (Val B12 Ala, Tyr B16 Ala, Phe B24 Ala, Tyr B26 Ala, Asn A21 Ala, Phe B25 Ala, Thr B27 Ala) and nds that the rst four mutants exist as a monomeric insulin aer the mutation. Why these residues (Val B12 , Tyr B16 , Phe B24 , Tyr B26 ) are important to the dimerization of the insulin can be well explained by the analysis of the residue-residue interactions here. Of course, the hydrophobic residue Phe B25 is an exception. In the mutation experiment, Phe B25 does not present to be critical to the insulin dimerization apparently. In the simulation, its SASA reduction is smaller than the other hydrophobic residues (Table 1). And its p-p interaction is weaker than the other aromatic pairs (Fig. 7(a), magenta line). So, compared to the other key residues, Phe B25 may have less contribution to the binding of the monomers. MM-GBSA or MM-PBSA are the popular approaches to study the relative binding affinity between similar complexes. 76,77 Because in this case, the entropy contribution can be safely ignored. For instance, Huiyong et al. study lots of proteinligand systems and nd that there is a good correlation between the enthalpy value and the binding affinity in experiment. 76 To verify the idea, we further investigate the variation of the enthalpy of the insulin dimer and its mutants by the MM-PBSA method. Seven residues Asn A21 , Val B12 , Tyr B16 , Phe B24 , Phe B25 , Tyr B26 and Thr B27 are mutated to Ala one by one. The entropy term is not included in the calculation. Table 2 gives the enthalpy of the wild-type dimer and the mutants. The changes of the structural RMSD in the simulations are also presented. Compared to the wild-type dimer, the three mutants, Val B12 , Tyr B16 Ala, Tyr B26 Ala, have a much higher enthalpy. Mutations on these residues decrease the binding affinities of the dimer remarkably. The results are close to the ndings in previous alanine scanning experiment 38 except the mutant Phe B24 Ala. It presents that MM-PBSA is successful in the prediction of the relative binding affinity of the complex. As to the average structural C a RMSD, all the mutants show larger RMSD values than the wild-type dimer. The mutations destabilize the structure of the native state.
To describe the effect of the mutation, we also calculate the secondary structures of the residues on the binding interface (B9-B30) for the wild-type dimer and all the mutants by DSSP. 78 Among them, the mutant Phe B24 Ala shows the most difference than the wild-type dimer. The changes of the secondary structures of the wild-type dimer and the mutant Phe B24 Ala are shown in Fig. 9. In the gure, red color represents the a-helix and purple is the b-sheet. We nd that both of the a-helix (B9-B18) and the b-sheet (B24-B26) are quite stable in the wild-type dimer. The dimerization is helpful to the stabilization of the secondary structures. But for the mutant Phe B24 Ala, the a-helix changes into the "turn" conformation frequently in the whole simulation time. The average percent of the a-helix (B9-B18) decrease from 83.0% (wild-type) to 72.8% (mutant). As to the individual residues in the helix, in monomer I, the percent of Ala B14 and Val B18 decrease 26.0% and 21.0%, respectively. The other residues, Val B12 , Glu B13 , Leu B15 , Tyr B16 and Leu B17 , decrease about 15%. And in monomer II, the residue Ala B14 decreases 19.0%, and Val B12 , Glu B13 and Val B18 decrease about 5%.

Conclusion
Protein-protein association is an important process in cell biology. Many typical systems have been successfully investigated in the simulations, such as the melittin aggregation 79,80 and the britin foldon domain assembly. 81 In this work, we study the dimerization of insulin. Combing the SMD method 16 with the connement method, 32 we calculate its absolute binding free energy based on a ctitious thermodynamic cycle. The calculation result is close to the experimental value, which proves its feasibility to small protein-protein systems. Moreover, in order to understand the association mechanism of the insulin dimer, we analyze the variation of SASA, p-p interaction and hydrogen bond interaction during the binding process. And the residues Vla B12 , Tyr B16 , Phe B24 , Phe B25 , and Tyr B26 have the most favorable contributions to the dimerization. The current work may have implications for the development of the insulin analogues.

Conflicts of interest
There are no conicts to declare.