Markov state models and NMR uncover an overlooked allosteric loop in p53

The tumor suppressor p53 is the most frequently mutated gene in human cancer, and thus reactivation of mutated p53 is a promising avenue for cancer therapy. Analysis of wildtype p53 and the Y220C cancer mutant long-timescale molecular dynamics simulations with Markov state models and validation by NMR relaxation studies has uncovered the involvement of loop L6 in the slowest motions of the protein. Due to its distant location from the DNA-binding surface, the conformational dynamics of this loop has so far remained largely unexplored. We observe mutation-induced stabilization of alternate L6 conformations, distinct from all experimentally-determined structures, in which the loop is both extended and located further away from the DNA-interacting surface. Additionally, the effect of the L6-adjacent Y220C mutation on the conformational landscape of the functionally-important loop L1 suggests an allosteric role to this dynamic loop and the inactivation mechanism of the mutation. Finally, the simulations reveal a novel Y220C cryptic pocket that can be targeted for p53 rescue efforts. Our approach exemplifies the power of the MSM methodology for uncovering intrinsic dynamic and kinetic differences among distinct protein ensembles, such as for the investigation of mutation effects on protein function.


System set up
The DNA binding domain initial coordinates were taken from chain B of PDB 1TSR, which include p53 amino acids 96 -289. For the mutant simulations, the tyrosine in position 220 (125 in the clipped domain) was mutated to a cysteine using tleap module in Amber14. 1 The crystallographic water molecules were retained and each system was solvated in an 8Å TIP3P water box. 2 The zinc ion and its coordinating residues were modeled using the cationic dummy atom model. 3 Each system was brought to 0.12 M salt concentration by adding K + and Clatoms. The structure file of each system consisted of about 27,220 atoms, which were prepared using Amber FF14SB force field. 1,4

Molecular dynamics simulations
The solvated proteins were minimized and equilibrated as described in Malmstrom et al, 5 using the GPU version of Amber 14. To increase the conformational sampling, a round of accelerated MD simulations (aMD) 6 was performed from the equilibrated structure using Amber14. Each system was simulated for 100 ns and 10 structures were selected for each system by clustering the conformations based on RMSD of the center of mass of each residue using a k-means algorithm in MSMBuilder2 7 and using the cluster centroids. These 10 structures were used as seeds for short unbiased MD simulations, each performed in triplicate with new starting velocities. After each round of simulation, the joint trajectories were processed for MSM model construction, and new starting coordinates were selected, prioritizing the exploration of new areas in the conformational space, until converged models were obtained based on MSM validation metrics (see below). Individual simulations ranged from 10 to 300 ns in length. In total, the wildtype system was simulated for 89 s, while Y220C required 63 s for appropriate model construction.

Markov state model construction
Simulation data was processed and models were built using PyEMMA 8 , version 3.5.6. Features consisted of pairwise distances, with pairs being selected after a tICA-based iterative process that eliminated redundant pairs located consistently close (< 3Å) or far (>10Å) in all frames of the simulations, as well as pairs involving residues located close to the clipped termini, with low variance (<0.05 Å) and those that accounted for low correlation with the first tICs (Supplementary Table 1). The final feature set consisted of 24 pairs (Supplementary Table 2). Time-independent component analysis (tICA) 9 was used to process the joint wildtype and Y220C featurized data.
Distinct loop-centered Markov state models were constructed using the 17 features that are centered in L6 and the 7 for L1. Discretization was performed with k-means clustering, k = 200, for each system (wildtype and Y220C) separately, and accuracy of the models verified by implied timescale (ITS) plots and Chapman-Kolmogorov tests ( Supplementary Figures 7 and 8). The L6 and L1-focused models were constructed with MSM lag time of 10 ns each. The MSMs were then coarse-grained using hidden Markov state models (HMMs) with a lag time of 3 ns (L1) or 2.5 ns (L6), again validated by ITS plots and Chapman-Kolmogorov tests ( Supplementary Figures 9 and   10). The choice of macrostates for coarse graining of the MSMs was guided by the relative separation between neighboring timescales, favoring the number of macrostates that resulted in a timescale ratio above 1.5 with the next timescale. In cases where the consecutive timescales separation were not obvious, we verified the cluster assignments according to the different number of macrostates considered and the Chapman-Kolmogorov tests to decide on the final number of macrostates to proceed with. Standard deviations were calculated using Bayesian hidden Markov state models corresponding to the respective HMMs.

Pocket characterization
Pocket volume measurements were performed with POVME, version 2.0, 10 and druggability assessments were based on computational solvent mapping of randomly selected conformations from the MSM metastable states using FTMap. 11 Existence of hydrogen bonds across the simulations was probed using MDTraj, 12 with a hydrogen bond defined as established if donoracceptor distance < 2.5 Å and angle > 120.

MD ensemble comparison with experimental structures
For comparison of the conformations sampled by the simulations with experimentally-resolved p53 structures, we transformed the coordinates of wildtype and Y220C structures solved by X-ray crystallography and NMR spectroscopy applying the same feature and tICA transformation steps used for the L1 and L6 MSMs. A total of 58 wildtype and 54 Y220C chains were selected from 16 and 29 PDB entries, respectively, and are listed in Supplementary Tables 3 and 4. These structures were selected out of all wildtype and Y220C deposited structures as they contained the same number of C atoms to the simulation structures and thus could be directly compared in tICA space.
The MD ensemble was also compared to the 2FEJ solution NMR structure's J-coupling and NOEderived restraints. Distances corresponding to the 2,201 NOE pairs were calculated for the accrued wildtype simulations, and the fraction of frames that exceed the upper distance restraints were computed. Dihedral angles were calculated for the dihedrals identified by the J-coupling restraints, and violations computed when the average dihedral angle was outside the interval defined by the experimental value +-experimental error.

NMR data collection and sample preparation
WT p53 construct was provided by Rainer Brachman. 15 N-labeled p53 DBD was prepared similarly to Wong et al. 13 The DBD core domain of human p53 (94-312) was transformed into BL21 E. Coli cells. Bacteria were grown at 30 °C in 15 N-enriched Neidharts minimal media to a density of 0.8-1.0 OD (550 nm) . The temperature was lowered to 18 °C and both IPTG and ZnCl2 were added to a final concentration of 1 mM. Cells were allowed to grow for an additional 6-8 hours and then harvested by centrifugation. The frozen cell pellet was resuspended in 20mM sodium phosphate, pH7.2, 10 mM BME, 0.5 mM PMSF, and lysed using sonication. Cell debris were removed by centrifugation at 4 °C for 30 minutes. The supernatant was applied to a SP-sepharose column and the protein eluted with a gradient of 100 mM-600mM NaCl. Samples were dialyzed into 15 mM potassium chloride, 25 mM sodium phosphate, pH 7.1, 10 mM BME and concentrated to a protein concentration of 400 μM.
All NMR experiments were performed on Varian Inova 800 MHz at 20 o C. NMR data were processed using nmrPipe. 14 Residues assignments and rate measurements (using peak volumes) in all 2D HSQC spectra were accomplished using CcpNmr Analysis. 15 3D 15 N-TOCSY-HSQC (t m = 75 ms) and NOESY-HSQC (t m = 100 ms) were analyzed for assignment of backbone amide resonances. Published WT assignments 13 were confirmed in spectra of our WT sample.