Deciphering the selectivity of inhibitor MKC9989 towards residue K907 in IRE1α; a multiscale in silico approach

The selectivity of the ligand MKC9989, as inhibitor of the Inositol-Requiring Enzyme 1α (IRE1α) transmembrane kinase/ribonuclease protein, towards the residue K907 in the context of Schiff base formation, has been investigated by employing an array of in silico techniques including Multi-Conformation Continuum Electrostatics (MCCE) simulations, Quantum Mechanics/Molecular Mechanics (QM/MM) calculations, covalent docking, and Molecular Dynamics (MD) simulations. According to the MCCE results, K907 displays the lowest pKa value among all 23 lysine residues in IRE1α. The MMCE simulations also indicate a critical interaction between K907 and D885 within the hydrophobic pocket which increases significantly at low protein dielectric constants. The QM/MM calculations reveal a spontaneous proton transfer from K907 to D885, consistent with the low pKa value of K907. A Potential Energy Surface (PES) scan confirms the lack of energy barrier and transition state associated with this proton transfer reaction. Covalent docking and MD simulations verify that the protein pocket containing K907 can effectively stabilize the inhibitor by strong π–π and hydrogen bonding interactions. In addition, Radial Distribution Function (RDF) analysis shows that the imine group formed in the chemical reaction between MKC9989 and K907 is inaccessible to water molecules and thus the probability of imine hydrolysis is almost zero. The results of the current study explain the high selectivity of the MKC9989 inhibitor towards the K907 residue of IRE1α.


Introduction
Inositol-Requiring Enzyme 1a (IRE1a, hereaer IRE1) is a transmembrane endoplasmic reticulum (ER) bound bifunctional kinase and endoribonuclease protein crucial for the unfolded protein response (UPR) signaling. Upon ER stress, IRE1 homodimerizes, trans-autophosphorylates and oligomerizes, resulting in endoribonuclease activity with the excision of a 26 nucleotide intron from the X-box binding protein 1 (XBP1) mRNA. 1,2 The mature form of XBP1s initiates the gene response to ER stress. [3][4][5] UPR signaling is a promising target for the treatment of numerous diseases 6 and targeting IRE1 activity, the most conserved sensor of the UPR, could elucidate the molecular mechanism of these complex signaling pathways. An inhibitor called MKC9989, able to target the RNase domain, has been developed by using a series of in silico, in vitro and in vivo data. [7][8][9] From a functional point of the view, the MKC9989 inhibitor ( Fig. 1) belongs to a series of IRE1 RNase inhibitors termed hydroxy aryl aldehydes (HAA) where the aldehyde moiety reacts chemically with the amine side chain of K907 through a Schiff base reaction. 7,9 The molecular mechanism that can rationalize the high selectivity specically towards K907 out of the total 23 lysine residues is not yet well-understood. In the current work, complementing our previous study, 10 we have conducted an indepth investigation of this phenomenon.
To properly address the high selectivity of MKC9989 towards residue K907, we have considered different factors before and aer the Schiff base formation, such as the capability of the lysine residues to be engaged in Schiff base reaction and the stability of the newly formed Schiff base. The most critical prerequisites for Schiff base formation is that the side chain amine group of lysine should be in its uncharged (deprotonated or neutral) state 11 to be able to nucleophilically attack the carbaldehyde carbonyl group of MKC9989. 7 Hence, the pK a value of the lysine plays a crucial role for its reactivity toward Schiff base formation. 12 Another important factor is the capability of the receptor pocket environment in stabilizing the newly bound inhibitor. 13 In this context, water accessibility to the imine group is an important parameter, since these can be easily hydrolyzed in the presence of water through a reverse Schiff base reaction. 14 pK a is a quantity that describes the amount of free energy change of a residue during the protonation process at a given pH. 15 At neutral pH, the pK a of an isolated lysine in water is 10.4, 16 thus the side chain amine group would be protonated. However, in a protein, the microenvironment properties around each residue can signicantly alter its pH dependence towards protonation, mainly by electrostatic interactions, which results in signicant shis in pK a values. 17 It has been demonstrated that a pK a shi of just one pH unit may have a large impact on the conformational change and biological activity of biomolecules. 16 Antosiewicz et al. 18 demonstrated that it is almost impossible to estimate reasonable values of pK a for a fully rigid protein, especially with dielectric constants smaller than 20. The explicit motions of side chains can be considered in MD simulation with a predened protonation state and by averaging the pK a values over each structure throughout the trajectory. 19,20 While this strategy seems efficient, it however ignores the energy difference between congurations in the MD trajectory, which in turn contributes to the free energy change of protonation. 16 Herein, the Multi-Conformation Continuum Electrostatics (MCCE) approach 19 was instead employed to calculate pK a values of all 23 lysine residues in the cytosolic domain of the IRE1 (PDB/Chain ID: 3P23.A). MCCE combines continuum electrostatics and molecular mechanics. 19 The protein side chain motions are simulated explicitly through rotamer sampling while the dielectric effect of the solvent and bulk protein is modeled by continuum electrostatics. The electrostatic interactions were computed by means of the Poisson-Boltzmann equation through multiple DelPhi runs as implemented into MCCE v2.7, and the resulting pK a values were obtained using Metropolis Monte Carlo sampling of all conformers as a function of pH. 19 To conrm the MCCE simulation results, Quantum Mechanics/Molecular Mechanics (QM/MM) calculations 21 were performed to evaluate the explicit interaction between residues around K907 in the IRE1 RNase domain (residues 830-964).
Moreover, potential energy surface (PES) scans and hydrogen bond strength calculations were performed using the same approach.
Except for residue K907, for which the IRE1/MKC9989 covalent complex is available (PDB code: 4PL3), 9 covalent docking 22 was used in the absence of crystallographic structures to provide covalent IRE1/MKC9989 complexes with a total of four representative lysine residues. 10 The stability and nonbonded interactions of the covalently bound MKC9989 inhibitor with two buried (K907 and K599) and two solvent accessible (K656 and K799) lysines ( Fig. 1) were studied using a series of MD simulations. 10 Also, the solvent accessibility of the newly formed imine groups, aer Schiff base reaction, was quantitatively studied by means of radial distribution function (RDF) of water molecules around the imine groups during the MD simulation.
Upon combining the results of MCCE simulation, QM/MM calculations and MD simulations, the data in full explains the observed high selectivity of MKC9989 and related HAA-based inhibitors towards K907. 10

Protein preparation
The IRE1 apo monomer was built by keeping chain A from the crystal structure (PDB ID: 4PL3) 9 and removing all other chains, ligands and cofactors, including the covalently bound MKC9989. The residue involved in Schiff base formation, K907, was restored to its original lysine side chain form. The IRE1 apo monomer was then prepared using the protein preparation wizard implemented in the Schrödinger program package. 23 Hydrogen atoms were incorporated and possible missing side chain atoms and missing loops were added using the Prime program. 11,24 Tautomeric and protonation states of all residues were determined at pH ¼ 7.4. The prepared structure of IRE1 was then rened using the OPLS3e force eld 25 in a restrained minimization procedure.

MCCE simulation
The MCCE technique 19 was employed to calculate pK a values of all residues in the prepared IRE1 structure. All side chain atoms were allowed to move while the protein backbone was considered to be rigid. Also, all protonatable residues were allowed to be in neutral or ionized states. PARSE radii and charges were used for the protein atoms. 26 The MCCE simulations were carried out with three different protein dielectric constant values (3 ¼ 2, 3, and 4), while the solvent dielectric constant was kept constant at 80 with a salt concentration of 150 mM. It has been demonstrated that a protein dielectric constant of 2 is consistent with PARSE charges 27,28 and it is also commonly used for many studies based on Molecular Mechanics Poisson-Boltzmann (MM-PB) 29 or Molecular Mechanics Generalized Born (MM-GB) 30 methods. The value of 3 used in a calculation depends on the level of physical detail represented explicitly in a model. 31 For fully microscopic simulations, 3 ¼ 1 because all contributions to dielectric relaxation in the protein-water system are treated explicitly. 32 However, microscopic models of the system with implicit treatment of induced dipoles (electronic polarization) will be best described by using 3 ¼ 2. 32 On the other hand, when the both induced dipoles and protein relaxation (nuclear relaxation or reorientation of dipoles) are treated implicitly, 3 > 4 should be used. 33 In the MCCE simulation, protein relaxation is simulated explicitly through rotamer sampling while the dielectric effects of the solvent and bulk protein are modeled implicitly by continuum electrostatics. Therefore, according to literature, 3 ¼ 2 can be an appropriate choice for dielectric constants of the protein. 34 The torsion energies and van der Waals parameters were evaluated based on the standard AMBER force eld. 35 Herein, we employed FULL MCCE analysis where the positions of all side chains were rst determined by (a) rotation around all rotatable bonds by 45 increments followed by (b) pruning rotamers with undesirable clashes. Aerwards, two step optimizations were carried out for dihedral angle optimization and for proton addition followed by nal pruning of conformers. A total number of 7936, 7110, and 6467 conformers were considered for DelPhi 36 pairwise interaction calculations and Monte Carlo sampling procedure at 3 ¼ 2, 3, and 4, respectively.

MKC9989 ligand preparation
The structure of the MKC9989 ligand was obtained from the cocrystallized PDB structure (4PL3) and its aldehyde moiety was re-modeled as appropriate (Fig. 1). The pre-reactive MKC9989 ligand was then prepared in the Schrödinger LigPrep program at pH ¼ 7.4 using the OPLS3e force eld. 25 Finally, the most stable conformer of the ligand was determined using the Mac-roModel program implemented in the Schrödinger suite, and used for covalent docking studies. Herein, we used the Mixed Torsional/Low-Mode sampling (MTLMOD) method. 37 The probability of torsion rotation/molecular translation was set to 0.5 with 100 steps per rotated bond. The minimum and maximum distances for low-mode motions were set to 3.0 and 6.0 A, respectively. The energy window for saving the structures and maximum atom deviation cutoff were assigned to be 5.0 kcal mol À1 and 0.5 A, respectively. The OPLS3e force eld was employed with water as solvent. The extended cutoff option was chosen to truncate the non-covalent interactions (i.e. 8.0, 20.0, and 4.0 A for van der Waals, electrostatic, and hydrogen bond interactions, respectively). The Polak-Ribiere conjugate gradient (PRCG) 38 method with maximum iteration of 2500 and convergence threshold of 0.05 kJ mol À1 A À1 was used as minimization algorithm acting on the gradient.

Covalent docking
Besides the co-crystallized complex with MKC9989 covalently bound to K907, the prepared MKC9989 ligand was covalently docked into IRE1 using CovDock in Schrödinger. 39 The additional selected lysine residues (K599, K656 and K799) were dened as active residues and the imine condensation reaction between the amine group of the lysine and the reactive aldehyde of MKC9989 was simulated. Energies were calculated using the MM-GBSA rescoring function. 40 As described in ref. 10, by performing Solvent Accessible Surface Area (SASA) analysis of the twenty-three lysine residues in the IRE1 cytosolic domain, two representative buried lysines (i.e. K907 and Lys599) and two solvent exposed lysines (i.e. K656 and K799) have been chosen. The calculated SASA values are 6.1, 11.5, 104.0, and 200.0 A 2 for K907, K599, K799, and K656, respectively. The general idea was to choose representative lysines in a way that they are at a distance from each other to avoid ligand-ligand interactions and minimize allosteric effects during the IRE1 + 4 MKC9989 MD simulations (Fig. 1).

MD simulation
The highest scoring poses from the covalent docking at each of the three additional lysines were, together with the K907-bound crystallographic structure, subjected to an MD simulation in NPT ensemble using the Desmond MD simulator engine in Schrödinger 41 employing the OPLS3e force eld. 25 Water molecules were modeled using the TIP3P force eld. 42 Periodic boundary conditions were applied in all directions along with a 10 A water buffer around the protein in an orthorhombic simulation box. The net charge of the system was balanced using the proper number of counter ions (i.e. Cl À /Na + ) and the salt concentration was set to 150 mM to represent physiological conditions. Temperature (300 K) and pressure (1 atm) were controlled using the Nose-Hoover thermostat 43 and the Martyna-Tobias-Klein barostat, 44 respectively. The initial minimization and relaxation protocol consisted of (a) NVT Brownian dynamics with restraints on solute heavy atoms at T ¼ 10 K for 100 ps, (b) NVT simulation at T ¼ 10 K with restraints on solute heavy atoms for 12 ps, (c) NPT MD simulation at T ¼ 10 K with restraints on solute heavy atoms for 12 ps, (d) NPT MD simulation at T ¼ 300 K with restraints on solute heavy atoms for 12 ps, and (e) NPT MD simulation at T ¼ 300 K without restraints for 24 ps. Following equilibration, the MD simulation was then run for 100 ns with a trajectory sampling frequency of 100 ps in the production step.

QM/MM calculations
The ONIOM (Our own N-layered Integrated molecular Orbital and molecular Mechanics) technique, 45 as implemented in Gaussian 16,46 was employed to carry out the QM/MM calculations on the RNase domain of IRE1 (residues 830-964). The residues K907, D885, F889, L886 and L914 ( Fig. 2A) were considered as the QM region and treated by the dispersion corrected Minnesota M06-2X functional 47 along with the 6-31+G(d,p) basis set, while the rest of the RNase domain was assigned as the MM region for which the Universal Force Field (UFF) 48 was employed. The structure optimization started with ionized states of K907 and D885. The potential energy surface (PES) scan and hydrogen bond strength evaluation were performed at the same level of theory. The PES scan was performed as a function of OH distance (0.7-2.2 A with step length 0.1 A) and OHN angle (180 AE 75 ). Fig. 3 shows the pK a values of all lysine residues extracted from the MCCE simulations at protein dielectric constants 2, 3, and 4. The lowest pK a values belong to K907 and K559 with values 8.6, 5.1, and 0.7 (K907) and 9.0, 8.4, and 7.6 (K559) at 3 ¼ 4, 3, and 2, respectively. The pK a values of these two buried residues decrease as 3 decreases which indicates their preferences to be deprotonated at lower protein dielectric constants. It is surprising that pK a of K907 at 3 ¼ 2 is totally acidic compared to the other lysine residues (Fig. 3). The pK a values of the remaining lysines are essentially unaffected by the change in dielectric constant, and range between 8 and 12.5. A closer look at the pK a values of the other residues around K907 ( Fig. 2A) indicates that D885 exerts an equally abnormal increase in pK a with decreasing protein dielectric constant. The pK a values of D885 are calculated to be 1.8, 3.5, and 9.6 at 3 ¼ 4, 3, and 2, respectively. This result clearly illustrates an interaction between K907 and D885 inside the hydrophobic cavity, which is signicantly enhanced as the protein dielectric constant is reduced.

Pre-Schiff base reaction analysis
It has been demonstrated that the pK a of a buried ionizable residue is extremely sensitive to the protein conformation and polarity of the surrounding microenvironment. 17 In internal polar hydrophilic microenvironments, the pK a values are close to those of individual amino acids in water. However, in apolar hydrophobic microenvironments, the pK a of ionizable residues will shi making the neutral state more favorable. 17 Isom et al. have studied the effect of the polarity of the microenvironment on the pK a of lysine residues by engineering 25 variants of staphylococcal nuclease with lysine residues at internal This journal is © The Royal Society of Chemistry 2020 RSC Adv., 2020, 10, 19720-19729 | 19723 positions. 17 They found a very large reduction in pK a of some lysine residues, as low as 5.3, inside a hydrophobic pocket in close proximity to carboxylic group containing residues.
To conrm the MCCE results and to investigate the interaction between K907 and the surrounding residues, a series of QM/MM calculation were conducted based on dispersion corrected DFT using the M06-2X functional. We started the structure optimization process with K907 and D885 in their respective ionized states, based on the prepared pdb structure (PDB code: 4PL3). As Fig. 2 shows, a clear proton transfer occurs Fig. 3 The calculated pK a values of all 23 Lys residues of the IRE1 cytosolic domain at three different protein dielectric constants (i.e. 3 ¼ 4, 3, and 2). Intermediate residues colored relative to the lowest (red) and highest (blue) value in each set. from K907 to D885 during the optimization. This implies that K907 prefers to remain deprotonated inside the highly hydrophobic pocket surrounded by F889, L886, and L914. The transferred proton maintains a hydrogen bond between the donor and acceptor atoms in the optimized structure, with OH, NH, and ON distances 0.99, 1.96, and 2.95 A, respectively, and OHN angle 171.8 . Additional calculations revealed that this hydrogen bond is relatively weak, with an energy of 10.1 kcal mol À1 .
Since the proton transfer has occurred during the structure optimization process, it is expected that there is no or very low energy barrier for this reaction mechanism. Indeed, a PES scan, performed as a function of OH distance (0.7-2.2 A with step length 0.1 A) and OHN angle (180 AE 75 ), conrms the hypothesis (Fig. 4). As seen in Fig. 4A, the PES along the proton transfer pathway possesses a double minimum prole feature with a very shallow well around the amine nitrogen atom of K907 residue. Indeed, the energy barrier for proton transfer from K907 to D885 (0.82 kcal mol À1 ) is 35 times lower than the proton transfer in the opposite direction (28.89 kcal mol À1 ). This very small energy barrier could not prohibit the spontaneous proton transfer during the optimization as it is close to the RT value at 298 K ($0.6 kcal mol À1 ); i.e. the proton will easily pass through this minimal barrier just by means of thermal motion. Two main phenomena explain how proton transfer can occur during the minimization: (1) strong electrostatic interaction of positively charged K907 and negatively charged D885 leads to an enhancement in zero-point vibrational energy and amplitude of N-H vibration, which consequently can overcome the small energy barrier. (2) It is well-known that, due to light weight of the proton, quantum effects cannot be neglected for reactions involving proton transfer. 49 Proton tunneling is a quantum mechanical phenomenon that can justify how protons can transfer from one microstate to another one by passing through the energy barrier between the two. 49 The contour map prole of the PES (Fig. 4B) shows that the optimum values for OH distance and OHN angle are 0.998 A and 171.7 which are consistent with the structural parameters of the optimized conguration. The results of the pK a calculations from the MCCE simulations and the spontaneous proton transfer phenomenon from the QM/MM calculations provide a clear picture linking the prerequisites for Schiff base chemical reaction to the high selectivity of MKC9989 towards K907.

Post-Schiff base reaction analysis
To investigate the parameters which may inuence the stability of the newly formed Schiff base between four representative lysine residues 10 and the aldehyde moiety of MKC9989, the covalent complexes between MKC9989 and K599 (kinase pocket), K656 and K799 (solvent exposed), and K907 (RNase site), were explored further (Fig. 5). The docking score values and MM-GBSA free energy of binding (DG b ) ( Table 1) indicate that the pocket containing K907 is the favored site for covalently bound MKC9989 to IRE1. However, given the difficulty of the Fig. 5 The structure of the four covalently bound MKC9989 molecules (space filling model) to the four selected lysine residues of the cytosolic part of IRE1. CovDock program to reproduce the exact crystallographic pose of MKC9989 towards K907, 50 we decided to use the structure from PDB ID 4PL3 at this particular site in the further investigations.
The IRE1/4ÂMKC9989 complex was then subjected to a 100 ns MD simulation for further evaluation of ligand dynamics and solvent accessibility of the formed imine bond. As mentioned, solvent accessibility is an important parameter since the imine group can easily be hydrolyzed in the presence of water molecules in a reverse Schiff base reaction. Fig. 6A shows the RMSD of the covalently attached MKC9989 ligands during the 100 ns molecular dynamics simulation. As this gure indicates, the ligand movement inside the RNase pocket (K907) is smaller compared to the MKC9989 ligands covalently bound to K599, K656 and K799. Fig. 6B shows the RMSF plot of the ligand bound to the four different lysines. Ligand RMSF plots describe the ligand uctuations broken down to individual atoms. The atom numbers are according to the 2D structure of the ligand shown in the inset panel of Fig. 6B. The ligand RMSF will give us better insight on how ligand fragments interact with the protein and their entropic role during the binding event. As the gure indicates, the RMSF values of the ligand bound to K907 are considerably lower than those attached to the other lysines (specically solvent exposed ones). The differences in RMSF values are signicant for atoms around the covalent bonding position (i.e. two fused rings) which conrms the stability of MKC9989 bound to K907 within the RNase domain.
The protein-ligand interaction diagram (Fig. 6C) shows that MKC9989 inside the pocket containing K907 exhibits pp interactions with F889 and H910 during 90% and 83% of the simulation time, respectively. The very low solvent accessibility of the imine group formed within this region (Fig. 7) can be associated to these p-p interactions that prevent water molecules from diffusing into the pocket and make, along with the strong hydrogen bonds to Y892 and K907, the ligand molecule more rigidly bound inside this pocket in comparison with the other sites (Fig. 6B). Common for MKC9989 bound to any of the other three lysines are water-assisted hydrogen bonds to the interacting residues, and a high degree of solvent exposure.
In Fig. 7A we present the radial distribution function, RDF (g(r)), of water molecules around each imine groups as a function of distance between the carbon atom of the imine (C i ) and oxygen atoms of water (O w ) during the MD simulation. The lack of the rst peak in the RDF plot of MKC9989 inside the pocket containing K907 shows that the corresponding imine group is fully inaccessible to water molecules, which prevents imine hydrolysis process from taking place. Fig. 7B shows the integral of RDF during the MD simulation which quantitatively represents the average number of water molecules around each imine groups, as a function of distance. As Fig. 7B indicates, as opposed to the other three lysines, there are no water molecules around the imine group formed during the reaction between MKC9989 and K907, up to 6 A away from the C i atom. As discussed earlier, the strong p-p interactions between MKC9989 and F889 and H910 may block the water molecules from diffusing into the pocket region.

Conclusions
Herein, the high selectivity of the UPR inhibitor MKC9989 towards K907 in the IRE1 RNase pocket is investigated by means of multiscale in silico techniques including MCCE simulations, QM/MM calculations, covalent docking, and MD simulations. The MCCE simulations showed that the lowest pK a values belong to K907 and K559 with values of 8.6 and 9.0, 5.1 and 8.4, and 0.7 and 7.6 at 3 ¼ 4, 3, and 2, respectively. The pK a values of in particular K907 decreases dramatically as 3 decreases, which indicates preferred deprotonation at lower protein dielectric constants. The MCCE simulations also identied a critical interaction between K907 and D885 within the hydrophobic pocket, which was markedly enhanced at lower protein dielectric constants. The QM/MM calculations showed a spontaneous proton transfer from K907 to D885, in agreement with the low pK a value of the K907 residue. The PES scan conrmed the negligible energy barrier (0.82 kcal mol À1 ) associated with this proton transfer reaction. The results from covalent docking and MD simulations demonstrated that the pocket containing K907 can effectively stabilize the inhibitor's structure by strong p-p and hydrogen bonding interactions. RDF analyses furthermore conrmed that the imine group formed in the chemical reaction between MKC9989 and K907 is totally inaccessible for water molecules, and thus the probability of imine hydrolysis is negligible.
These results give new insights into the selectivity of the MKC9989 inhibitor towards K907 out of the 23 lysine residues of IRE1, and provide a suitable starting point for QM/MM or QM/MM-MD studies of the MKC9989-IRE1 reaction. This work also shows how different computational tools can be utilized to describe the local factors leading to Schiff base formation in biological systems.