Computationally driven discovery of SARS-CoV-2 Mpro inhibitors: from design to experimental validation

We report a fast-track computationally driven discovery of new SARS-CoV-2 main protease (Mpro) inhibitors whose potency ranges from mM for the initial non-covalent ligands to sub-μM for the final covalent compound (IC50 = 830 ± 50 nM). The project extensively relied on high-resolution all-atom molecular dynamics simulations and absolute binding free energy calculations performed using the polarizable AMOEBA force field. The study is complemented by extensive adaptive sampling simulations that are used to rationalize the different ligand binding poses through the explicit reconstruction of the ligand–protein conformation space. Machine learning predictions are also performed to predict selected compound properties. While simulations extensively use high performance computing to strongly reduce the time-to-solution, they were systematically coupled to nuclear magnetic resonance experiments to drive synthesis and for in vitro characterization of compounds. Such a study highlights the power of in silico strategies that rely on structure-based approaches for drug design and allows the protein conformational multiplicity problem to be addressed. The proposed fluorinated tetrahydroquinolines open routes for further optimization of Mpro inhibitors towards low nM affinities.


Introduction
Since December 2019, the COVID-19 global pandemic has put the entire world on edge. 1,2 The disease is due to a coronavirus (CoV) called SARS-CoV-2 (severe acute respiratory syndrome, SARS) that has triggered the start of an unprecedented research effort. [3][4][5] While the vaccination strategy 6 has been particularly successful with the rise of mRNA techniques, additional programs have been launched to obtain antivirals able to reduce the impact of COVID-19 on ill patients. Despite these efforts, few potential treatments are presently available with the exception of Paxlovid, a nirmatrelvir/ritonavir combo proposed by Pzer. 7 Due to the persistence of the pandemic, it remains essential to propose new antiviral drugs. A possible strategy consists in designing small molecules to interact with one of the main proteins of the SARS-CoV-2 virus, thus blocking its activity. Among the potential targets, the main protease protein, denoted as M pro or 3CL pro , is a primary choice 8 as it has no human homolog and it is well conserved among coronaviruses, 9 especially in terms of the structure of its active site, catalytic dyad, and dimer interface. Furthermore, M pro is required to release viral proteins for particle assembly, and is thus essential to the virus replication cycle.
Developing a new drug targeting the viral M pro is challenging as it requires extensive resources and the success rate is notoriously low. 10 Relying on in silico driven rational design could accelerate the process. In fact, it diminishes the cost by reducing the need for synthetic iterations while also providing an interpretation of the interactions occurring between the target protein and potential inhibitors.
It is important to note that theoretical modeling of M pro is challenging as the protein exhibits high structural exibility [11][12][13] leading to high conformational complexity. M pro is also involved in a variety of complex protein-ligand-solvent interaction networks. 12,13 These challenges can be tackled using a high-resolution modeling approach 12,13 going beyond rigid docking procedures (see ref. 14 for a detailed discussion of the difficulties of docking approaches in predicting the native binding modes of small molecules within M pro ).
Many studies have been devoted to the design of new M pro inhibitors 3,5,15-25 through joint computational and experimental approaches. In particular, a recent study by the Jorgensen group highlighted the usefulness of relative binding free energy (RBFE) computations as part of the drug design process. 26 In this paper, we present a computationally driven discovery and binding mode rationalization of new SARS-CoV-2 M pro inhibitors. In doing so, we build on our previous high-resolution M pro molecular dynamics studies. 12,13 Here, we explore more deeply some specic subpockets of the substrate binding site of the protease using absolute binding free energy (ABFE) calculations and adaptive sampling grounded on extensive molecular dynamics simulations with high-resolution polarizable force elds (PFFs). Using the GPU-accelerated module 27 (GPU ¼ Graphics Processing Unit) of the Tinker-HP molecular dynamics package 28 coupled to the AMOEBA PFF, [29][30][31][32] it has been shown that simulations can reach the required level of accuracy and ms timescales needed to explore the structural rearrangement and interactions prole of this exible protein. 12,13 More precisely, the modeling of M pro necessitates the ability to evaluate at high resolution various types of key interactions including hydrogen bonds, salt bridges, p-p stacking, and specic solvation effects. Long timescales are required to achieve sufficient sampling. This is now possible by using the large number of graphics processing units (GPUs) that are presently available on supercomputers and high-performance cloud computing platforms. In this study, we combine our computationally driven strategy, using absolute binding free energy computations [33][34][35][36][37] and unsupervised adaptive sampling, 12,13 with machine learning-assisted property predictions, while conducting extensive characterization experiments including nuclear magnetic resonance (NMR), mass spectrometry (MS), and FRET-based assays to evaluate the activity of the newly designed compounds.
In the following, we introduce our design strategy, which led to non-covalent and covalent inhibitors of M pro (ESI- Fig. 1 ‡). Then, we describe how an interplay between experiments and molecular simulations allowed the discovery of a nal compound (QUB-00006-Int-07) with a high affinity to the protease (IC 50 ¼ 830 AE 50 nM).

Systems preparation
The protease dimer structure (PDB code: 7L11) was used for all the MD simulations and it was prepared at physiological pH (pH ¼ 7). This structure has a higher resolution (1.80Å) than the PDB structure (PDB code: 6LU7) used in our previous work 12 (resolution of 2.16Å). Both structures are of the holo state in complex with covalent inhibitors, and the rotamers of the key residues at the catalytic site (Cys145, His41, His162, His163, and His172) are virtually identical. The protonation states of His residues were assigned based on previous work, 38 where His41 and His80 are protonated at the delta carbon atom and all other His residues are epsilon-protonated, which is favorable for substrate binding. 38 This is different from our previous work where His64 and His80 are protonated at the delta carbon atom and all other histidines are epsilon-protonated. 12 All water molecules were retained except for those that might collide with the ligands.

Simulation protocols
All-atom simulations were performed using Qubit Pharmaceuticals' Atlas platform which enables the use of any type of High-Performance Computing (HPC) system including cloud supercomputing infrastructures. Among its possibilities, Atlas has the ability to efficiently handle polarizable force eld molecular dynamics simulations using a custom version of the multi-GPU module 27 of the Tinker-HP molecular dynamics package, 28,39 to perform docking runs using either Autodock-Vina 40 or Autodock-GPU, 41 and to enable machine learning predictions of molecular properties.
2.2.1. Molecular dynamics simulations. All Tinker-HP MD simulations (for a total of several ms) were performed in mixed precision to benet from a strong acceleration of simulations using GPUs. 27 The AMOEBA polarizable force eld 29-32 was used to describe the full systems including the protein, ions and water. Several utilities (TinkerTools) from Tinker 8 (ref. 42) were used. Periodic boundary conditions were applied within the framework of smooth particle mesh Ewald summation 43,44 with a grid of dimensions 120 Â 120 Â 120 using a cubic box with side lengths of 97Å. The Ewald cutoff was set to 7Å, and the van der Waals cutoff was 12Å. Langevin molecular dynamics simulations were performed using the recently introduced BAOAB-RESPA1 integrator (10 fs outer timestep), 45 a preconditioned conjugate gradient polarization solver (with a 10 À5 convergence threshold) to solve polarization at each time step, 46 hydrogen-mass repartitioning (HMR) and random initial velocities. Absolute binding free energy simulations following a protocol described in the next section were performed as well as adaptive sampling runs that are also described further in the text. Absolute binding free energy computations were both performed on the HPE Jean Zay Supercomputer (IDRIS, GENCI, France) and on Amazon Web Services (AWS). All adaptive sampling computations were performed using AWS. Simulations on AWS used both p3.2x (NVIDIA V100 GPU cards) and p4d.24xlarge (NVIDIA A100 GPU cards) instances whereas computations on the Jean Zay supercomputer were powered by V100 cards.
2.2.2. Molecular docking protocol. The protonation states of the ligands were calculated at a neutral pH and the hydrogen atoms were added using Chimera. Next, we docked the ligands QUB-00006-Int-01(R) and QUB-00006-Int-01(S) into the M pro dimer structure using Autodock Vina 1.1.2. 40 AutoDock Vina requires the pdbqt format for the input les of the receptor and the ligand. Therefore, using the scripts 'prepare_receptor4.py' (v 1.13) and 'prepare_ligand4.py' (v 1.10) provided by Autodock Tools, 47 we generated pdbqt les corresponding to the receptor and the ligands, respectively. We set the exhaustiveness search to 100 and the num_mode option to 50.
Since molecular docking could suggest reasonable potential binding modes, but does not always rank the most likely binding mode as the best docked pose, 14,48 we visually inspected the generated docked poses and chose an ensemble of binding poses with different binding orientations that we used to run MD and ABFE calculations in order to explore the binding mode of QUB-00006-Int-01, as described in the Results and discussion section.

Equilibration.
A detailed description of the equilibration protocol used for MD simulations can be found in the ESI. ‡ 2.2.4. High-resolution adaptive sampling simulations. Starting from several binding poses as described above we ran adaptive sampling simulations using the AMOEBA force eld [29][30][31][32] in order to explore their stability and more generally to explore the conformational space of the ligands in the pocket of the M pro . Because of the exibility of the pocket and the role it may play in the exploration of the potential binding modes of the ligand, we chose to keep the whole system (ligand + protein) exible during this sampling phase. The restart strategy (similar to the one introduced in ref. 12) was the following: rst, all the previously generated conformations of the protein were loaded and aligned with MDTraj, 49 then PCAs of the conformations of the ligand were computed using Scikitlearn 50 and these frames were projected on the rst four PCAs. Finally, the same scheme as the one described in ref. 12 was used to generate new starting points, favoring points that were less explored during the previous phases. In practice, a rst set of 5 simulations of 10 nanoseconds were performed using different random seeds, and then 4 iterations of 10 times 10 nanoseconds were generated using the adaptive sampling protocol described above, for a total of 450 nanoseconds.
2.2.5. Absolute binding free energy calculations. In order to benet from the high-accuracy evaluation of free energies using the AMOEBA force eld, 33-37 we used the same clustering algorithms as described above to analyze the adaptive molecular dynamics simulations. The largest clusters were used for absolute free energy calculations. The double-decoupling protocol and the Bennett acceptance ratio (BAR) 51 method were used to calculate the standard binding free energy for each binding pose. 33,37 There were 27 or 26 thermodynamic states for the decoupling in the complex phase or the aqueous phase. A distance restraint between two groups of atoms in the ligand and in the protein binding pocket was applied when decoupling the ligand in the complex to accelerate the convergence when the ligand is fully decoupled, and the restraint was removed at an additional step at the full interaction state. A harmonic restraint with a force constant of 15.0 kcal mol À1ÅÀ2 and radius of 2.0Å was used. An analytical correction was added to the binding free energy to account for the standard state at 1.0 mol L À1 in the fully decoupled state. 10 ns simulations were performed for each thermodynamic state for the simulations of M pro in complex with x0195, QUB-00006(S), QUB-00006(R), and QUB-00006-Int-07. For the simulations of M pro in complex with QUB-00006-Int-01(R) and QUB-00006-Int-01(S), we ran each thermodynamic state for 20 ns. We used the BAOAB-RESPA1 integrator with a 10 fs time step and we calculated the electrostatic interactions using Ewald summation with a real space cutoff of 7Å. van der Waals interactions were calculated using a cutoff of 12Å with long-range correction.
2.3. Quantitative structure-property relationship (QSPR) modeling: predicting solubility using machine learning Qubit Pharmaceuticals' Atlas internal machine learning-based QSPR module was used to predict the water solubility (log S, S measured in mol L À1 ) and octanol/water partition coefficient (log P). To build a water solubility QSPR predictor, the AqSolDB dataset 52 was used as a training set. To predict octanol/water partition coefficients (log P), the dataset from EPA's OPERA 53 was used as a training set.
Selected datasets were preprocessed and standardized to some extent by authors of the corresponding publications. However, the need for additional processing was identied when doing exploratory data analysis. We discarded compounds with less than two carbon atoms and kept molecules with molecular weight between 50 and 750 daltons. Additional rules of fragment standardization developed at Qubit Pharmaceuticals were applied.
2.3.1. Similarity analysis. Tanimoto similarity 54 to the x0195 compound was calculated for each molecule using the MAACS ngerprint from the RDKit Open-Source Cheminformatics Soware (https://www.rdkit.org). The Morgan circular ngerprint 55 with radius ¼ 2 and nBits ¼ 2048 from RDKit was also tested and the results (not shown) exhibit a similar ranking of the compounds.

Recombinant expression of SARS-CoV-2 M pro in E. coli
The plasmid pGEX-6P-1 encoding SARS-CoV-2 M pro56 was a generous gi from Prof. Rolf Hilgenfeld, University of Lübeck, Lübeck, Germany. Protein expression and purication were adapted from Zhang et al. 56 The expression plasmid was transformed into E. coli strain BL21 (DE3) and then pre-cultured in YT medium at 37 C (100 mg mL À1 ampicillin) overnight. The pre-culture was used to inoculate fresh YT medium supplied with an antibiotic and the cells were grown at 37 C to an OD600 of 0.6-0.8 before induction of overexpression with 0.5 mM isopropyl-D-thiogalactoside (IPTG). Aer 5 h at 37 C, cells were harvested by centrifugation (5000g, 4 C, 15 min) and frozen. The pellets were resuspended in buffer A (20 mM Tris, 150 mM NaCl, pH 7.8) supplemented with lysozyme, DNase I and PMSF for the lysis. The lysate was claried by centrifugation at 12 000g at 4 C for 1 h and loaded onto a HisTrap HP column (GE Healthcare) equilibrated with 98% buffer A/2% buffer B (20 mM Tris, 150 mM NaCl, 500 mM imidazole, pH 7.8). The column was washed with 95% buffer A/5% buffer B and then His-tagged M pro was eluted with a linear gradient of imidazole ranging from 25 mM to 500 mM. Pooled fractions containing the target protein were subjected to buffer exchange with buffer A using a HiPrep 26/10 desalting column (GE Healthcare). Next, PreScission protease was added to remove the C-terminal His tag (20 mg of PreScission protease per mg of target protein) at 12 C overnight. Protein solution was loaded onto a HisTrap HP column connected to a GSTrap FF column (GE Healthcare) equilibrated in buffer A to remove the GST-tagged PreScission protease, the His-tag, and the uncleaved protein. M pro was nally puried with a Superdex 75 prep-grade 16/60 (GE Healthcare) SEC column equilibrated with buffer C (20 mM Tris, 150 mM NaCl, 1 mM EDTA, 1 mM DTT, pH 7.8). Fractions containing the target protein at high purity were pooled, concentrated at 25 mg mL À1 and ash-frozen in liquid nitrogen for storage in small aliquots at À80 C.

Protein characterization and enzymatic activity
The molecular mass of the recombinant SARS-CoV-2 M pro was determined by direct infusion electrospray ionization mass spectrometry (ESI-MS) on a Xevo G2-XS QTOF mass spectrometer (Waters). Samples were diluted in 50% acetonitrile with 0.1% formic acid to achieve a nal 1 mM concentration of protein. The detected species displayed a mass of 33 796.64 Da, which matches very closely the value of 33 796.81 Da calculated from the theoretical full-length protein sequence (residues 1-306). To characterize the enzymatic activity of our recombinant M pro , we adopted a FRET-based assay using the uorogenic substrate 5-FAM-AVLQ 0 SGFRK(DABCYL)K (ProteoGenix) harbouring the cleavage site of SARS-CoV-2 M pro ( 0 indicates the cleavage site). The uorescence of the intact peptide is very low since the uorophore 5-FAM and the quencher Dabcyl are in close proximity. When the substrate is cleaved by the protease, the uorophore and the quencher are separated, increasing the uorescence signal. Freshly unfrozen recombinant SARS-CoV-2 M pro was used in our assays. The assay was performed by mixing 0.05 mM M pro with different concentrations of substrate (1-128 mM) in the reaction buffer (20 mM Tris-HCl, 100 mM NaCl, 1 mM EDTA and 1 mM DTT, pH 7.3) in the nal volume of 100 mL. Fluorescence intensity (Ex ¼ 485 nm/Em ¼ 535 nm) was monitored at 37 C with a Victor3 microplate reader (Perki-nElmer) for 50 min. A calibration curve was created by measuring multiple concentrations (from 0.001 to 5 mM) of free uorescein in a nal volume of 100 mL reaction buffer. Initial velocities were determined from the linear section of the curve, and the corresponding relative uorescence units per unit of time (DRFU/s) were converted to the amount of the cleaved substrate per unit of time (mM s À1 ) by tting to the calibration curve of free uorescein. Inner-lter effect corrections were applied for the kinetic measurements according to ref. 57. The catalytic efficiency k cat /k m resulted in 4819 AE 399 s À1 M À1 , in line with literature data. 56,58 3.3. Nuclear magnetic resonance All the NMR screening experiments were performed with a Bruker Neo 600 MHz spectrometer, equipped with a nitrogen cooled 5 mm Prodigy CryoProbe at 298 K. The ligand binding was monitored by WaterLOGSY (wLogsy) 59

Screening of potential M pro inhibitors and hits validation
A FRET-based assay employed to test the enzymatic activity of the recombinant SARS-CoV-2 M pro was used to evaluate the ability of the compounds to inhibit its activity in vitro. In fact, inhibition of M pro by the tested compounds results in a decrease of the uorescence signal compared to the M pro activity in the absence of an inhibitor. A preliminary screening was rst performed at a single compound concentration to rapidly identify the ability of the compounds to inhibit M pro activity and to rank them according to their inhibitory activity. The protein was diluted in the reaction buffer (20 mM Tris-HCl, 100 mM NaCl, 1 mM EDTA and 1 mM DTT, pH 7.3) and pipetted into a 96-well plate to a nal protein concentration of 0.02 mM in a nal volume of 100 mL. Each compound at the nal concentration of 100 mM was incubated with M pro for 20 minutes at room temperature. Aer incubation, the peptide substrate (5 mM nal) was added to initiate the reaction which was monitored for 50 min at 37 C. The nal DMSO amount was 3.75%. Two controls were prepared for each experiment: the peptide substrate in the absence of M pro (0% M pro activity, hence minimal uorescence intensity detected) and the reaction mixture in the absence of the compounds (100% M pro activity, therefore maximal uorescence intensity detected). Following the preliminary screening, the most active compounds (hits) were tested at increasing concentrations (0.25, 0.5, 1, 5, 25, 50, 100, and 150 mM) to determine the doseresponse curves and calculate IC 50 values tted using Graph-Pad Prism 5 soware. Each experiment was performed in triplicate and the results were used to calculate an average and a standard deviation.

Binding studies by mass spectrometry
Samples were prepared by mixing appropriate volumes of M pro (10 mM nal) with each compound in the reaction buffer (20 mM Tris-HCl, 100 mM NaCl, 1 mM EDTA and 1 mM DTT, pH 7.3). The nal mixtures had a 1 : 1 or 10 : 1 compound : protein molar ratio. Samples were incubated at room temperature for 20 min before analysis. Control experiments were performed on 10 mM solutions of M pro in the absence of the compounds. Mass spectrometric analyses were carried out in positive ion mode by ESI-MS under denaturing conditions, i.e. water/acetonitrile 50 : 50 with 0.1% formic acid on a Q-Tof Xevo G2S (Waters, Manchester, UK). Data were processed using MassLynx V4.1 soware.

Synthesis
The detailed synthetic protocol used to prepare all molecules can be found in the ESI. ‡

Results and discussion
Several diverse fragments binding the viral M pro have been identied by high-throughput crystallographic screening of this protease. Among the screened fragments, x0195 (PDB ID: 5R81 (ref . 61) -Fig. 1A) shows one of the highest binding affinities 62 and therefore provides a reasonable starting point for fragmentbased design of novel M pro inhibitors.
The crystal structure shows that x0195 is located within the M pro substrate binding pocket, at the interface of the two subpockets S2 and S4 as described by Cannalire et al. 63 S4 is a solvent exposed subpocket that is partially composed of a exible loop delimited by Gln189 and Gln192, while S2 is dened by the side chain residues of Phe140, Asn142, His163, Glu166, and His172, and the backbone atoms of Phe140 and Leu141.
In the co-crystal structure corresponding to M pro in complex with x0195 (see Fig. 1A), the aromatic portion of the molecule is located between the side chains of Gln189 and Met 165, while the unsaturated region of the tetrahydroquinoline scaffold establishes a hydrophobic interaction with the side chains of His41 and Met49. The N-methyl group attached to the tetrahydroquinoline core is solvent exposed, while the sulfonamide moiety is in contact with Pro168 and Glu166. In particular, the aromatic ring of the small molecule is bisecting the SO 2 unit and the polar sulfonamide nitrogen (-NH 2 ) is reaching the boundaries of the hydrophobic part of the binding pocket composed of the alkyl chain of Pro168.
Aer comparing the available X-ray structural information with previously conducted studies on small molecule conformational preferences derived from crystal structure data, 64 we noticed that x0195 was modeled in a high energy conformation and that an unusual high-energy (i.e. repulsive) contact occurs between the sulfonamide oxygen and the carbonyl oxygen of the Glu166 backbone. Additionally, the tetrahydroquinoline scaffold was not fully exploring S2 subpocket boundaries. As reported by Cannalire et al. 63  Therefore, exploiting this knowledge might be key to designing specic inhibitors of CoV M pro .
In order to rene the available X-ray structural model and to gather more structural insights (e.g. protein exibility and binding pocket rearrangements 12,13 ) to guide the design of better binders of the subpocket S2, we ran all-atom molecular dynamics simulations using the AMOEBA polarizable force eld 29-32 on M pro (PDB code: 7L11) in complex with x0195 (PDB code: 5R81).
Our simulations show that the unusual high-energy contacts between the sulfonamide oxygen and the carbonyl oxygen of the Glu166 backbone no longer occurred. Also, regarding the electronic structure, we noticed that the p orbitals of the aromatic carbon C1 bisect (e.g. are parallel to) the SO 2 angle, compared with a 90 value for the same angle as reported in the crystal structure (see Fig. 1). Moreover, the NH 2 of the sulfonamide group is engaged in favorable polar interactions with the Gln189 side chain and the solvent.
Then, we performed absolute binding free energy calculations on the rened protein-ligand structure. Our results show that x0195 binds to the protein with a binding free energy of À2.83 kcal mol À1 at 283 K, which is comparable to the experimental binding energy (À3.59 AE 0.1 kcal mol À1 , see Table 1).
We obtained the experimental binding free energy by converting the experimental K d (1.7 mM AE 0.2) provided in the literature 62 using the Gibbs free energy equation and the experimental temperature used in the binding assays (283 K). The agreement of the computed free energy prediction with the experimental results is reasonable. Further analysis of MD simulations suggests that the tetrahydroquinoline scaffold of x0195 is sub-optimally occupying the binding pocket.
We put in place design strategies to modify the chemical moieties of x0195 and potentially increase its binding affinity. Here, we introduce the design of a new molecule, namely QUB-00006 (Fig. 2), where we added two uorines and a methyl group on the tetrahydroquinoline core of x0195. Also, we substituted the sulfonamide group on the aromatic ring of the molecule by a methanethiol. Fluorination at position 3 of the tetrahydroquinoline core could increase ligand occupancy with no disruption of the water network surrounding the binding pocket, 12,13 while methylation at position 4 seemed an interesting modication to increase the potential interactions of the ligand with binding pocket residues. QUB-00006 was generated based on the structure and position of x0195 in the co-crystal (5R81), and then placed in the receptor structure (M pro dimer with the PDB code: 71LL); next, the M pro -QUB-00006 complex was equilibrated using MD simulations (see ESI Section 1 ‡ for the detailed protocol), followed by free energy calculations. To explore the potential of our computational platform in designing new binders with no or few experimental data such as ligand-M pro co-crystal structures, we leveraged all-atom molecular dynamics simulations on QUB-00006 complexed with M pro . The aim of this approach is to gather insights on the binding conformation of the newly in silico designed ligand, assess pocket tness, and evaluate its binding affinity using ABFE calculations.
The initial molecular conformation is mostly anchored at the binding pocket, with the a,a-diuoro-methyl group attached to the tetrahydroquinoline core fully occupying the buried part of the S2 subpocket, which is composed of the side chains of Met49 and His41, while the sulfonamide moiety extends to S4 (Leu167 and Pro168). We note that methylation at position 4 of the tetrahydroquinoline core introduces a chiral center, however no signicant differences in terms of pocket occupancy between the R and S enantiomers were observed.
The computed absolute binding free energies for QUB-00006(R) and QUB-00006(S) are À2.73 AE 0.34 kcal mol À1 and À2.72 AE 0.22 kcal mol À1 , respectively (Table 1). These results suggest that the designed uorinated fragment is a binder at the M pro S2 subpocket and could represent a starting point for structure-based design of novel M pro inhibitors.
The identied binding mode is dened by several favorable intermolecular interactions occurring between the newly designed ligand and the M pro binding pocket: (i) the sulfur group of QUB-00006(R) interacts with the oxygen of the carbonyl belonging to the backbone of Glu166 with a distance of 3.3Å, (ii) the a,a-diuoro moiety points towards His41, and (iii) the sulfur of Met49 establishes a favorable interaction with one of the two uorines of the substrate (distance 3.3Å). In fact, the sulfur-oxygen contact observed in our simulations is in agreement with the ndings of a study conducted by Iwaoka et al., 68 where they found that a total of 1200 and 626 fragments from the Cambridge Structural Database (CSD) and Protein Data Bank (PDB), respectively, have close intermolecular S-O contacts (with a distance of 3.52Å or less). Another study analyzing the protein structures deposited in the Protein Data Bank reports 1133 interactions between His and halogen atoms found in 3833 PDB entries with one or more halogenated ligands co-crystallized with a protein. 69 Moreover, the strong S-F interaction identied during the simulations is in good agreement with experimentally observed distances for uorinesulfur contacts in crystal structures (2.8-3.4Å). 70 It is worth noting that such interactions involving sulfur and halogen atoms are usually better captured with polarizable models than with their classical counterparts. [71][72][73] QUB-00006 was then synthesized following the path in Fig. 4 in order to validate in vitro the simulation outcomes.
The ligand orientation in the MD simulations and the computed hydration ratio of the different atoms of QUB-00006 during ABFE simulations suggest that proton C is solvent exposed, while the protons of the methyl thioether group (group  The assignment scheme is reported along with the 2D structure of the ligand. The strong negative intensity of the signals of the hydrogens of groups A, D, and E suggests that they are orientated towards the protein, while the hydrogen atom in C is solvent exposed. These experimental findings confirm the hydration ratio calculated during our binding free energy simulations and described in panel B. E) and the methyl group at position 4 of the tetrahydroquinoline core (group D) are buried (Fig. 3B).
Those ndings strongly correlate with the NMR characterization of QUB-00006 obtained via WaterLOGSY experiments. In fact, WaterLOGSY epitope mapping conrms that QUB-00006 binds to the protein binding pocket. We leveraged the experimental approach to better identify the region of the ligand in contact with the protein. In Fig. 3C, the proton signals arising from the two methyl groups (D and E) in the presence of M pro show a change in the sign suggesting that these protons are in close contact with the protein. Similarly, the aromatic protons A and B undergo a sign inversion. In contrast, the aromatic proton C is not signicantly perturbed, which suggests that this position is solvent exposed. The binding mode suggested by NMR is in agreement with the MD-derived hydration ratios conrming the predictive power of our MD-based approach to characterize the binding mode of novel ligands at the experimental level of accuracy ( Fig. 3B and C).
Although we were able to gather structural information about the binding mode of QUB-00006 using a WaterLOGSY assay, we could not measure its experimental binding affinity via STD NMR due to solubility challenges.
Several synthetic steps were performed in order to obtain QUB-00006, as detailed in Fig. 4. Through this synthetic scheme, we obtained different intermediates characterized by a better solubility prole (Table 2). Interestingly, the hydroxyquinolinone QUB-00006-Int-01 displayed the best solubility prole of all the synthetic intermediates, making it a strong candidate for in vitro evaluation.
Before conducting NMR STD experiments to determine the dissociation constant (K d ) of the more polar QUB-00006-Int-01 compound, we decided to predict its binding conformation at the binding pocket and compute the respective absolute binding free energy. Modication of the molecular scaffolds, especially in fragment-like molecules, might affect the binding mode 74 compared to a reference structure (e.g. x0195 as per PDB ID:5R81).
We used a combination of docking, MD and ABFE calculations to explore the putative binding mode of QUB-00006-Int-01. Those calculations identied two dominant binding modes for QUB-00006-Int-01(R) and QUB-00006-Int-01(S) (Fig. 5A) with computed binding free energies of À4.4 and À4.3 kcal mol À1 , respectively. Then, we estimated the binding affinity of QUB-00006-Int-01 towards M pro by a STD NMR titration and we found a dissociation constant in the low millimolar range, with an estimated K d of 1.9 AE 0.6 mM (À3.71 AE 0.2 kcal mol À1 ), which agrees reasonably well with our binding free energy calculations ( Table 1). As shown in Fig. 5A, both enantiomers bind to the S2 and S4 subpockets with the thioether group being fully buried Fig. 4 Synthesis path of 3,3-difluoro-4-methyl-7-(methylsulfanyl)-1,2,3,4-tetrahydroquinoline named QUB-00006. Table 2 Prediction of the properties of compounds using our machine learning workflow. MW represents the molecular weight of the compounds in daltons, log S is the predicted solubility of the different compounds, log P represents the differential solubility, and the Tanimoto coefficient reflects the similarity of the selected compounds relative to x0195 in subpocket S2, which correlates with WaterLOGSY experiments (Fig. 5C). Additionally, QUB-00006-Int-01(R) and QUB-00006-Int-01(S) ll up a binding pocket space that is different from the one occupied by QUB-00006. On the other hand, starting with a QUB-00006-like binding mode, we ran an additional absolute binding free energy calculation on an M pro -QUB-00006-Int-01(R) complex and obtained a binding free energy of À0.9 kcal mol À1 . These results suggest that QUB-00006-Int-01 and QUB-00006 might have different dominant binding conformations (see Fig. 3A and 5A). Since a fragment-like molecule could have multiple binding modes and the ligand conformation is unlikely to be fully 3 kcal mol À1 , respectively; also, they bind to the S2 and S4 subpockets in a similar fashion with the thioether group being fully buried in S2. On the other hand, starting with a QUB-00006 like binding mode, we ran an additional absolute binding free energy calculation on M pro in complex with QUB-00006-Int-01(R) and obtained a second binding mode for QUB-00006-Int-01(R) (in green) with a binding free energy of À0.9 kcal mol À1 . (B) STD titration profile of QUB-00006-Int-01. The ligand concentration ranges from 100 mM to 2 mM against 10 mM of M pro . (C) The WaterLOGSY spectra of QUB-00006-Int-01 with M pro (in blue) and without M pro (in red). The assignment of the signals is reported on the 2D structure of the fragment. The methyl and the aromatic signals of the two protons adjacent to the hydroxyl group undergo a significant change, which suggests that these groups are in close contact with the protein's cavity. In contrast, the aromatic proton adjacent to the lactamic nitrogen undergoes a reduction of its intensity, suggesting that this proton is partially exposed to the solvent. These STD results confirm our computational characterization of the binding mode of QUB-00006-Int-01 (panel A).
sampled during 20 ns of binding free energy simulations, we used unsupervised adaptive sampling (AS) to further explore the conformational space of QUB-00006-Int-01. AS can be used here as an interpretative tool able to gather structural insights on the various potential M pro -ligand interactions (see the ESI ‡ for details). The AS trajectories were clustered using averagelinkage hierarchical clustering algorithms and the top ten largest clusters were chosen for analysis. These clusters have comparable populations (the smallest clusters have 3-4 times smaller populations or 0.3 kcal mol À1 higher free energy than the largest clusters, see Table 3), indicating the coexistence of multiple binding modes.
Overall, our computational ndings on QUB-00006-Int-01 conrm that the structural approach we introduce in this work using a sequence of MD-based techniques (classical MD simulations, adaptive sampling, and absolute binding free energy calculations) is able to capture potential binding orientations of fragment-like compounds in the binding pocket of a protein, and to accurately predict their binding free energies.
Then, we analyzed the clustered QUB-00006-Int-01 binding conformations from the adaptive sampling simulations plotted as a function of the distance between the methyl thioether group in QUB-00006-Int-01 and the beta carbon of Gln189, and the distance between C2 (carbon connected to the hydroxyl group) and the sulfur atom (SG) of the catalytic side chain of the Cys145 residue (Fig. 6). We noticed that the distance of C2-SG in the most populated cluster generated by the AS simulations is around 4Å. To reinforce our analysis, we leveraged another unsupervised reduction of dimension technique: TICA (timelagged independent component analysis), 75 which aims at nding the slow collective variables of the data, and applied it to QUB-00006-Int-01(R). We then used the k-means clustering method on the data projected on this space and built a Hidden Markov State Model (HMSM). 76 Three clusters emerged, whose characteristics also show the coexistence of several binding modes of QUB-00006-Int-01(R), one of which corresponds to a distance between C2 and SG below 4Å. Detailed results can be found in the ESI. ‡ Targeting Cys145 with covalent warheads has been used by several researchers to discover novel potent inhibitors of M pro . 38,63,77 As a matter of fact, a simple chemical modication to QUB-00006-Int-01 would lead to QUB-00006-Int-07 bearing an a,a-diuoro-keto moiety, which is prone to a nucleophilic attack by the vicinal R-SH of Cys145. In order to enable the latter, QUB-00006-Int-07 would need to access the M pro substrate pocket and adopt a stable binding conformation prior to the covalent binding. Thus, we conducted absolute binding free energy simulations on the M pro -QUB-00006-Int-07 complex, which conrmed a favorable binding energy of QUB-00006-Int-07 to the M pro substrate pocket (À5.37 AE 0.23 kcal mol À1 ). As reported in Fig. 7, compound QUB-00006-Int-07 is bound to the S2 and S4 subpockets with the thioether group being fully buried in subpocket S2 and the a,a-diuoro-keto moiety facing Cys145. More precisely, the average distance between SG and the C is 3.65 angstroms (AE0.33) and the average distance between the SG and C2 is 3.61 angstroms (AE0.43) as can be seen in Fig. 7.
Our computational ndings motivated us to test the compound with a FRET-based proteolytic assay. This assay should detect potent functional binders to the viral M pro . Being a uorogenic assay, compounds with uorescence quenching properties can suppress the uorescence signal generated by Table 3 Population of the clusters generated by adaptive sampling performed on M pro in complex with QUB-00 006-Int-01(R) and (S). DDG (kcal mol À1 ) is the relative free energy at 298 K. The relative binding free energies reported for QUB-00 006-Int-01(R) and (S) are calculated using the respective cluster 1 as a reference ligand  6 Conformations of QUB-00006-Int-01 sampled during 20 ns of ABFE calculations and 450 ns of adaptive sampling simulations. The conformation was plotted as a function of two distances: (i) the distance between C2 (carbon of QUB-00006-Int-01 connected to the hydroxyl group) and the sulfur of Cys145, and (ii) the distance between the methyl thioether group in QUB-00006-Int-01 and the beta carbon of Gln189. "0" indicates the starting structure, "1" indicates the largest cluster, and "i" indicates the ith largest cluster. The frames were taken at 10 ps time intervals.
the protease activity. To eliminate false positive results, we conducted a preliminary counter screen and veried that the tested compound possesses negligible uorescence quenching effects. Subsequently, to assess the potential inhibitory activity of the compound against SARS-CoV-2 M pro , increasing concentrations of QUB-00006-Int-07 (0.25-150 mM) were incubated with 20 nM M pro before the addition of 5 mM FRET substrate. As shown in Fig. 8, QUB-00006-Int-07 inhibited M pro with 50% inhibitory concentration (IC 50 value of 830 AE 50 nM), thus resulting in a fairly potent inhibitor of the M pro enzymatic activity. The binding of QUB-00006-Int-07 to M pro was conrmed by electrospray ionization (ESI) mass spectrometry. A preliminary determination of the initial protein showed an experimental mass of 33 796.40 Da, which matches very closely the expected value of 33 796.64 Da calculated from the sequence (Fig. 9A). The sample obtained aer incubation of QUB-00006-Int-07 with M pro (compound : protein ratio ¼ 10 : 1) was analyzed by ESI-MS under denaturing conditions, and a representative spectrum is provided in Fig. 9B. In addition to the signals corresponding to multiple charge states of the initial protein (red dots), we identied the distribution of signals corresponding to the M pro modied by the presence of the compound (green asterisks) which is therefore covalently linked to the protein given the non-native conditions of the experiment. The nature of the adduct and the molecular mechanism of binding are under investigation and will be the subject of further studies.
Finally, in this work, the introduction of multiple modications (e.g. gem-diuoro, thioether, hydroxyl and methyl groups) to the tetrahydroquinoline scaffold of x0195, and the design and synthesis of novel molecular scaffolds, enabled the exploration of binding pocket boundaries and provided additional information related to druggability of the S2 subpocket. Other   "SG" stands for the sulfur atom in Cys145, "C" is the amide carbon in QUB-00006-Int-07, and "C2" is the carbonyl carbon in QUB-00006-Int-07. The average distances for SG-C and SG-C2 are 3.61Å and 3.65 A, respectively. (B) The dominant binding mode of QUB-00006-Int-07 within the M pro binding pocket. QUB-00006-Int-07 is shown in pink and the protein is shown in silver sticks and surfaces. The binding mode is very stable during the simulation, where the hydroxyl group is close to Cys145 and forms a hydrogen-bond with the Glu166 backbone, and the difluoro group interacts with the carbonyl group of Asn142. These binding modes are also comparable to the dominant binding mode of QUB-00006-Int-01(S) identified during ABFE calculations. molecules were produced over the course of this research but, due to their weaker activity, their detailed analysis is not provided here. Their list can be found in the ESI. ‡ These compounds were either designed computationally without leading to improved affinities or were synthesis intermediates. All resulting molecules were submitted to biological testing, but none of them were found to be as potent as QUB-00006-Int-07 nor presented a strongly druggable prole, compared to the previously discussed compounds.

Conclusion and perspectives
We presented a computationally driven discovery of a new set of non-covalent and covalent inhibitors of M pro that have been further characterized experimentally. The best compound, QUB-00006-Int-07, has been found to be a covalent binder that resulted in a potent inhibition of the M pro enzymatic activity (IC 50 ¼ 830 AE 50 nM). The results of the innovative scaffold design described here were obtained within three months via a fast-track project that took place in the summer of 2021. It involved a small consortium of theoreticians, organic chemists and drug designers, and demonstrated the effectiveness of a computation-guided synthetic strategy. Indeed, GPU-accelerated high-performance computing platforms can now provide access to high-resolution molecular dynamics simulations, which are able to predict detailed protein conformational maps and provide accurate absolute binding free energy results. Such computations can be further rationalized by means of adaptive sampling simulations, an approach which is able to decipher multiple binding modes. Coupled to NMR, in vitro experiments and machine learning, such high-resolution predictions yield structural insights regarding the design of new active compounds, while offering an atomic level understanding of binding affinities.
Beyond this preliminary proof of concept study, the next research steps will be devoted to the QM/MM modeling 78,79 of the warhead reaction mechanism 38,77,80 leading to the covalent binding of QUB-00006-Int-07, and to optimization of active compounds with the goal of reaching low nanomolar activity.

Data availability
All data have been provided in the main text and ESI. †