Energetics of a protein disorder–order transition in small molecule recognition

Many proteins recognise other proteins via mechanisms that involve the folding of intrinsically disordered regions upon complex formation. Here we investigate how the selectivity of a drug-like small molecule arises from its modulation of a protein disorder-to-order transition. Binding of the compound AM-7209 has been reported to confer order upon an intrinsically disordered ‘lid’ region of the oncoprotein MDM2. Calorimetric measurements revealed that truncation of the lid region of MDM2 increases the apparent dissociation constant of AM-7209 250-fold. By contrast, lid truncation has little effect on the binding of the ligand Nutlin-3a. Insights into these differential binding energetics were obtained via a complete thermodynamic analysis that featured adaptive absolute alchemical free energy of binding calculations with enhanced-sampling molecular dynamics simulations. The simulations reveal that in apo MDM2 the ordered lid state is energetically disfavoured. AM-7209, but not Nutlin-3a, shows a significant energetic preference for ordered lid conformations, thus shifting the balance towards ordering of the lid in the AM-7209/MDM2 complex. The methodology reported herein should facilitate broader targeting of intrinsically disordered regions in medicinal chemistry.


Protein samples preparation.
Plasmid constructs were ordered from commercial GeneArt gene synthesis services. The specific gene sequence of human MDM2-Lid (6-125 aa.) and human MDM2-Lid/short (17- Figure S1 and Figure S2), which showed that the MDM2 residues sequence in the purified proteins was 7-125 for MDM2/Lid and 17-125 for MDM2-Lid/short.

Calorimetric measurements
ITC was used to measure the binding affinity (Kd) of the MDM2 ligands Nutlin-3a and AM-7209.  Figure S3 and Figure S4. Values for the thermodynamic measurements reported in Figure 2 of the manuscript are shown in Michelsen also contained four extra amino acids after cleavage of the N-terminal fusion tag, whereas our constructs contained a hexahistidine tag at the C terminus of MDM2 to minimise impact on the conformational preference of the lid IDR.

Models preparation
Initial exploration of the conformational landscape of MDM2 lid was performed by means of accelerated molecular dynamics (aMD) simulations. 3 As starting points, we used three different MDM2 (residues 6-125) structures that differ in the conformation of the lid: semi-open state (obtained from model 2 of the NMR ensemble PDB id 1Z1M 4 ), closed state and ordered state reported by Bueren-Calabuig et al. 5 Missing residues of the C-terminal domain were modelled based on model 2 of the NMR ensemble PDB id 2LZG (residues 1-125). 6 The protein and ligand complexes were solvated in a triclinic TIP3P water box with edges extending 18 Å away from the edges of the protein, 7 and Clanions were added to neutralize the system. The ff99SBildnnmr force field was used to model the protein, 8 and Joung and Cheatham parameters were used for the anions. 9 Nutlin-3a and AM-7209 ligands were parameterized using GAFF atom types, 10,11 and atomic charges derived with the AM1-BCC method, 12 as implemented in the antechamber tool of the AMBER16 suite. 13 Each system was energy minimized with 4500 steps of steepest descent followed by 4500 steps of conjugate gradient and subsequently smoothly heated from 100 K to 298 K over a 1 ns-long simulation at constant volume, with positional restraints on protein and ligand atoms. Then each S6 system was subjected to a 4-ns equilibration simulation, with restraints being gradually reduced for protein and ligand. Finally, the density of the system was equilibrated by means of a 1-ns long simulation at 298 K in the NPT ensemble. Throughout the heating and equilibration stages, SHAKE was used for all bonds involving hydrogen atoms, 14 and a time step of 2 fs was employed.
Prior to the aMD simulations, a conventional 100-ns long simulation in the NPT ensemble was performed using the GPU accelerated version of PMEMD from the AMBER16 software package.

Accelerated Molecular Dynamics simulation parameters
aMD adds a positive energy boost to the potential energy function allowing the simulated systems to efficiently explore different regions of the potential energy surface (PES) through reduced energy barriers. A dual boost approach was used, with a potential energy boost applied to all the heavy atoms and an additional energy boost applied to all the torsion angles in the system.
To prevent an intense energy boost from pushing the system towards high energy regions of the PES that may lead to unrealistic behaviour, such as protein unfolding, residues in the MDM2 core region (25-125) were restrained using harmonic restraints with a force constant of 20 kcal mol -1 A -2 . For each system, 300-ns long aMD trajectories were obtained.
Initial EP, ED, ! and " parameters for the aMD runs were selected following the guidelines of Pierce and co-workers . 3 Trial runs were carried out to tune the parameters to achieve broad sampling of lid conformations. The values of EP, ED, ! and " described in Table S2 were adopted.

S7
Two suitable collective variables (CV) were selected to describe the conformational preferences of the MDM2 lid as observed in the aMD simulations. CV1 defines the extension of the lid as the distance between the Cα atoms of residues M6 and E23. CV2 is the dihedral angle defined by the Cα atoms of residues D11, M50, M62 and V41 (lid-core dihedral angle). Figure 3A in the manuscript depicts both CV.

Umbrella sampling protocol
An US protocol was devised to compute the equilibrium distribution of the lid conformational ensembles. The CV space was sampled within the CV1 values 5 -45 Å with an interval of 2 Å and CV2 values 36 -268 º with an interval of 8 º accounting for a total of 524 bins. The initial coordinates were selected by choosing the closest snapshot to the average structure sampled for that bin during the preceding aMD runs. The selected protein structures were extracted and used as the initial starting conformations. The protein was solvated again following the protocol described previously. Prior to the US runs, the starting conformations were re-equilibrated using the same energy minimisation protocol and a brief equilibration protocol (500 ps of thermalisation and 500 ps equilibration in the NPT ensemble). Then, 4-ns long production runs were performed using the same conditions described in the system setup section. Monitoring of the convergence profile of the FES along each CV suggested this was sufficient to obtain reasonably converged FES ( Figure S5). Harmonic potential restraints of 1 kcal mol -1 Å -2 and 0.12 kcal mol -1 deg -2 were S8 applied to the target CV1 and CV2 values respectively. These small restraining force constants were chosen to avoid excessive energetic penalties for conformations slightly deviating from the target CV values. Values of the reaction coordinates were stored every 10 fs for post-processing.

2D variational free-energy profile (vFEP) analysis
The 2D variational vFEP method was used to obtain unbiased free energy profiles along the defined CV space. 15 vFEP is a maximum likelihood parametric approach to reweight biased simulation data. For smoothly varying free-energy surfaces, the method has been shown to yield converged FES with fewer windows and with a fraction of the statistics required with the Weighted Histogram Analysis Method. To estimate uncertainties in free energies, all US trajectories were sub-divided into two parts of equal duration (2 ns) and analysed separately (Table S3).

Preparation of MDM2/ligands input files for free energy calculations
The protein-ligand structures for the free-energy calculations were obtained from the X-ray crystal structures with PDB IDs: 4HG7 for Nutlin-3a/MDM2, 16 and 4WT2 for AM-7209/MDM2. 17 All water molecules were removed from the structures and all proteins were capped at the C terminus and N terminus with N-methyl and acetyl groups respectively. The coordinates of the structured lid conformation for the three protein-ligand complexes were taken from the Pip-2/MDM2 crystal structure, 6 while the coordinates of the "closed" lid conformation were extracted from the calculated FES.

S9
Input files for the free-energy simulations were created using utilities from the AMBER16 software suite, 13 and the software FESetup. 18 Protein parameterisation was performed using the ff14SB Amber force-field, 19 while ligands were parameterised using the GAFF2 forcefield, 10,11 and AM1-BCC partial charges. 12 The system was solvated in a cubic box with TIP3P water molecules, 7 with a minimum distance between the protein and the edges of the box of 12 Å.
Counter ions were added to neutralize the total net charge. The same approach was followed for parameterising the ligand in the free phase.
Next an equilibration protocol was applied to relax the box size. Initially, energy minimization of the entire system was implemented with 1000 steps of steepest descent gradients, using the software sander. Then, an NVT protocol was followed for 200 ps at 298 K, followed by an NPT equilibration for a further 200 ps at 1 atm. Finally, a 2-ns NPT MD simulation was run with the software SOMD to reach a final density of about 1 g cm -3 . 20,21 The final coordinate files were retrieved using the software cpptraj.

Adaptive sampling protocol
Alchemical free energy simulations were performed following a double decoupling protocol. 22,23 In this method, a ligand L is mutated into a "non-interacting" molecule both in the solvated and the bound phase using a two-step process. An example of the thermodynamic cycle used in these calculations is illustrated in Figure S6.
During the protein-bound simulations the ligand is restrained to prevent drifting out of the binding site and to accelerate convergence of the free energy changes. A standard state correction term Δ #$%& ' is applied post-simulation to work out the free-energy change upon release of the ligand restraints to standard state condition. Details of the restraint protocol used for these simulations are given below.

S10
The overall cycle gives a standard free-energy of binding as shown by eq 1: Free-energy changes were estimated with the multistate Bennet acceptance ratio method as implemented in the Sire utility analyse_freenrg. 26 To achieve a more robust estimation of free energies, each simulation was repeated five times, using different initial velocities drawn from the Maxwell-Boltzmann distribution and statistical uncertainties are reported as 95% of the standard error of the mean.
For the adaptive sampling protocol, the simulations were carried out initially for a duration of 5 ns. Following this, we identify λ windows that contribute the most to the overall uncertainty of the absolute binding free-energy by calculating the standard deviation of the free-energy estimates between neighbouring MBAR windows. Only simulations whose uncertainty exceeds a threshold S11 t are carried further into the next iteration of the protocol. Figure S7 shows that the optimal choice of t is around 0.1 kcal.mol -1 for the benchmarked system because lower values yield similar results with additional computing costs. Higher values yield more time savings, but the free energy estimate deviates from the reference results.

Restraints protocol
The  Table S3.
The convergence profile of the free energies of binding of AM-7209 and Nutlin-3a bound to different lid states is shown in Figure S9. Additional epochs were run until the mean binding free energy estimates were deemed stable. In comparison with Pip2/MDM2-short the results are noisier S12 for simulations including the long lid versions, but the binding selectivity trend in binding energies of AM-7209 between MDM2/Lid-closed and MDM2/Lid-ordered state appears robust.

Control docking study.
The equilibrated conformations of the protein ligand complexes used at the start of the ABFE calculations were loaded in the software Flare 5. 28 The ''scoring in place with optimisation'' mode from the Docking module of Flare was used to estimate a DG score. The results are shown in      Figure S5. Convergence plots for apo MDM2 FES. Populations along each CV value for cumulative sampling time per window of 0-1 ns, 0-2 ns, 0-3 ns, 0-4 ns respectively.    Figure 4D of the main manuscript.
S24 Table S4. Standard state correction terms and standard binding free energies for the different ABFE simulations reported in this study. All energies in kcal.mol -1 . Uncertainty given as ±1s (n=5).