Identification of new hit scaffolds by INPHARMA-guided virtual screening

Identification of new hit scaffolds by INPHARMA-guided virtual screening  Justyna Sikorska, Luca Codutti, Bettina Elshorst, Lars Skjærven, Rebeca Saez-Ameneiro, Andrea Angelini, Peter Monecke, and Teresa Carlomagno † EMBL, Structural and Computational Biology Unit, Meyerhofstraße 1, D-69117 Heidelberg, Germany § Sanofi-Aventis Deutschland GmbH R&D LGCR/Structure, Design & Informatics, Industriepark Höchst, Bldg. G877, D-65926 Frankfurt am Main, Germany


Protein expression and purification
The BL21-AI bacterial strain was transformed with the pBadM11 vector containing the cDNA for human N-terminal His-tagged Cdk-2 (Uni-Prot P24941) and cultured in LB medium supplemented with 100 µM/ml ampicillin at 37°C.The overnight starter cultures were diluted 1:50 and grown until an OD 600 = 0.8 was reached, to be further induced with 0.2% arabinose.After 18-20h incubation at 17°C cells were harvested at 3500 x g for 45 min.Pellets were resuspended in loading buffer containing 50 mM NaH 2 PO 4 , 300 mM NaCl, 10 mM imidazole, 5 mM β-meracptoethanol, pH = 8.0 supplemented with DNAses (10 µg/ml), lysosyme (100 µg/ml), protease inhibitors tablets (Roche) and incubated for 60 min.Resuspended cell pellets were sonicated with short pulses (1s on/1s off) for 1.5 min in triplicates, centrifuged at 20000 x g for 1h and the soluble Cdk-2 fraction was loaded onto an Ni 2+ column.Bound protein was washed with loading buffer, and eluted with an elution buffer consisting of 50 mM NaH 2 PO 4 , 300 mM NaCl, 300 mM imidazole, 5 mM β-meracptoethanol, pH = 8.0.The N-terminal His-tag was cleaved after exchange to Tris buffer, pH = 8.0 and overnight incubation with TEV protease.Cleaved protein was again purified on the Ni 2+ affinity column (HisTrap FF, 5 ml); fractions were concentrated to 1ml using Amicon Ultra (Millipore) 10 kDa cutoff centrifugal filter units and loaded on a HiLoad 26/600 Superdex S75 column equilibrated with PBS buffer, pH = 7.4.Final exchange to the NMR buffer (deuterated PBS, pH = 7.4) was performed on the HiPrep 26/10 desalting column.The purity and identity of the Cdk-2 was confirmed by SDS-PAGE and mass spectrometric analysis.The final concentration was measured spectrophotometrically using an extinction coefficient of 37025 M -1 cm -1 at 280 nm.

Description of NMR experiments.
The INPHARMA and STD experiments were acquired at 286K on Bruker Avance II 800 MHz and 600 MHz spectrometers equipped with cryoprobes, at 286 K. Ligands chemical shift assignment and aggregation tests were performed on Bruker Avance II 600 MHz and Bruker DRX 500 MHz spectrometers.All NMR samples were prepared in 500 µl of deuterated PBS containing 5% DMSO-D 6 .For each ligand assignment of 1 H resonances was obtained through COSY, HSQC, HMBC, and NOESY (200 ms mixing time) spectra were obtained to unambiguously define 1 H shifts for further analysis.Pairwise aggregation was ruled out for all pairs through acquisition of NOESY spectra at 800 ms mixing time in the absence of protein.
The INPHARMA spectra were acquired at 286K as interleaved NOESY experiments with 300, 500 and 800 ms mixing times at 286K, and relaxation delay of 3.0 s.Spectral widths of 10417 and 9014 Hz, with a data matrix of 4K (F2) × 480 (F1), and 80-96 scans were employed for experiments acquired at 800 MHz and 600 MHz, respectively.All the acquired spectra were processed with TopSpin 3.2 (Bruker Biospin), and imported to the software Felix 2007 for measurement of peak volumes.From the total 15 combinations, 11 were included in the analysis; the excluded combinations were LL6-M77, LL6-RSA and M77-RSA, due to large spectra overlap, and ADO-LL4, because of too few INPHARMA correlations.
The STD spectra were acquired with ligands at 1mM and protein at 25µM concentration.Saturation of protein resonance was performed at the on-resonance frequency δ = 0.2 ppm with a train of 40 Gaussian bell-shaped selective pulses of 50 ms length; the off-resonance irradiation was done at the frequency δ = 50 ppm.A total of 320 transients with 2s recycle delay were collected for each spectra.For each ligand, STD intensities were normalized to the highest peak.

MD simulations and ensemble docking.
Three crystal structures of Cdk-2 were used for molecular dynamics (MD) simulations and subsequent ensemble docking: PDB-ID 1HCK (Cdk-2 in complex with ATP); 2VTM (Cdk-2 in complex with AT7519); 1H1R (Cdk-2/cyclin A in complex with NU6086).All-atom models of the protein structures were prepared with the Amber12 package suite 1 using the corresponding Amber99SB force field 2 .Ligand parameters were prepared with the generalized Amber force field (GAFF) 3 .Each protein-ligand complex was solvated with TIP3P water molecules, and energy minimized for 7000 steps.The system was then heated to 298 K with the Langevin temperature equilibration scheme with restrains on the solute; the equilibration till 100 K occurred in 50 ps using constant volume boundaries and was followed by equilibration until 298 K in 500 ps using constant pressure.In addition, six 100 ps -long equilibration steps with constant pressure and gradually reduced restraints on the solute (from 50 to 0.5 kcal/mol) were applied.The equilibrated system was subjected to a production simulation under constant volume and temperature (298 K) using 1 fs time step, SHAKE for bonds involving hydrogen atoms, and particle mesh Ewald (the non-bonded interaction cutoff was set to 11 Å).The resulting MD trajectories were post processed with PTRAJ to generate five cluster representatives per starting crystal structure.
The resulting fifteen cluster representatives were subsequently used in an ensemble docking protocol using both Glide and Surflex.For each program we generated ten docking modes for each structure.Each of the resulting docking modes was subsequently minimized and a 5 ps long MD simulation was performed (at 298 K using generalized Born solvation model with a non-bonded cutoff of 18 Å).From each simulation we collected 5 structures (by clustering) generating a total of 1500 binding modes per ligand.As a final step the binding free energy for each protein-ligand complex was estimated with the MM-PBSA tool in Amber and 50 % of the lowest energy structures were retained for further INPHARMA analysis.

Structure-based pharmacophore and virtual screening.
The binding poses selected by INPHARMA-STRING for ZIP, LL4, LL6, M77 and RSA were used as training set to generate pharmacophore models.For each ligand we identified the chemical features common to all docking poses selected by INPHARMA-STRING in the bestranking 5% structure pairs, we overlaid the binding poses at selected amino acids of the binding pocket and merged the pharmacophore using the software LigandScout 3.12 (Fig. S6).Additionally, the pharmacophore query was restrained by excluded volumes representing the Cdk-2 binding pocket (Fig. S7).Finally, two pharmacophoric models were created; the commonstructure-based pharmacophore (CSB) with six features and the reduced-structure-based pharmacophore (RSB) with 5 features (excluding HBA2).
Both pharmacophores were validated against a set of active compounds and decoys.To generate the validation set, we collected 706 compounds from the ChEMBL database of ligands active against uncomplexed Cdk2 (ChEMBL ID: ChEMBL301) with IC 50 below 500 nM.All compounds were clustered using ChemMineTools 4 and 115 representatives were chosen as the active dataset.For the decoy dataset we generated 2438 entries, with the program DUD-E (http://dude.docking.org/), 5starting from 33 compounds of the active pool.The performance of CSB and RSB pharmacophores was validated by the area under the ROC curve and by the enrichment factor (EF) calculated as follows: TPx%× N TPx% + FPx% ×P with TP x% (FP x% ) being the number of true positive (false positive) compounds identified in the first x% of compounds ranked according to the pharmacophoric score.N the total number of compounds in the database (2553) and P the total number of positive compounds (115). 6The pharmacophores were used to screen the 3D ZINC purchasable database updated on 16/07/14 (http://zincpharmer.csb.pitt.edu/pharmer.html).The query returned 1343 and 1206 hits for CSB and RSB, respectively, from a total of 22723923 compounds (215407196 conformations) upon addition of three filters: maximum hits per conformation, 1; maximum hits per molecule, 1; maximum RMSD, 0.75 and 0.45 Å, respectively.All selected compounds were docked flexibly to two Cdk-2 structures using Glide. 7These receptors structures were selected to account for two different conformations of the side chain of Lys-33 in the binding pocket: the position of the side chain nitrogen atom differs by 4.6 Å in the two structures.The docking poses were ranked with both GlideScore and X-score and compounds that scored among the top 100 of both ranking methods and fulfilled at least one HB and two HY interactions were considered for further analysis (28 and 16 ligands from the CSB and the RSB pharmacophores, respectively).Finally, 11 compounds were purchased and tested for affinity towards Cdk-2.

Kinase phosphorylation assay.
All assays were performed by Reaction Biology Corporation (Malvern, PA). 8 The following kinases with appropriate substrates were used for activity testing: ALK (GenBank accession number NP_004295.2) with substrate pEY, Cdk-2/cyclin A (GenBank accession number NM_001786/NM_001237) with histone H1, Cdk-2/cyclin E (GenBank accession number NM_001786/M73812) with histone H1, PKA (GenBank accession number NP_002721.1)with PKA substrate peptide.The reactions were carried out in 20nM HEPES (pH 7.5), 10mM MgCl 2 , 2mM DTT, 1 mM EGTA, 0.02% Brij35, 0.02 mg/ml BSA, 0.1 mM Na 3 VO 4 , 1% DMSO with prior addition of kinase, its appropriate substrate and ligand dissolved in DMSO.Reactions were initiated upon addition of 33 P-ATP at a final concentration of 10 µM, and incubated for 120 min in room temperature.Finally samples were applied on P81 ion exchange paper and washed extensively with 0.75% phosphoric acid prior to determining the filter radioactivity.Assay results are shown in the Table 1 with single dose assays and IC 50 being an average of duplicate and triplicate runs, respectively.

Fig. S1 (Fig.Fig.
Fig. S1 (A) Schematic representation of the INPHARMA spectrum with intraligand peaks shown in green and red and INPHARMA peaks represented as bicolored spheres.Dashed lines connecting two structures indicate NOEs between two ligands resulting from protein-mediated magnetization transfer.(B) Graphical representation of the INPHARMA-based lead design process used in this work.

Fig. S8 Fig. S10
Fig. S8 Selected compounds identified by the structure-based pharmacophores and their fits to the respective pharmacophores; CSB -top, RSB -bottom line.The H-bond acceptors, H-bond donor, and hydrophobic features are indicated by red, green and yellow spheres respectively.

Table S2 .
Compounds identified through virtual screening.An asterisk indicates the purchased compounds.