Rational design of thioamide peptides as selective inhibitors of cysteine protease cathepsin L

Aberrant levels of cathepsin L (Cts L), a ubiquitously expressed endosomal cysteine protease, have been implicated in many diseases such as cancer and diabetes. Significantly, Cts L has been identified as a potential target for the treatment of COVID-19 due to its recently unveiled critical role in SARS-CoV-2 entry into the host cells. However, there are currently no clinically approved specific inhibitors of Cts L, as it is often challenging to obtain specificity against the many highly homologous cathepsin family cysteine proteases. Peptide-based agents are often promising protease inhibitors as they offer high selectivity and potency, but unfortunately are subject to degradation in vivo. Thioamide substitution, a single-atom O-to-S modification in the peptide backbone, has been shown to improve the proteolytic stability of peptides addressing this issue. Utilizing this approach, we demonstrate herein that good peptidyl substrates can be converted into sub-micromolar inhibitors of Cts L by a single thioamide substitution in the peptide backbone. We have designed and scanned several thioamide stabilized peptide scaffolds, in which one peptide, RS1A, was stabilized against proteolysis by all five cathepsins (Cts L, Cts V, Cts K, Cts S, and Cts B) while inhibiting Cts L with >25-fold specificity against the other cathepsins. We further showed that this stabilized RS1A peptide could inhibit Cts L in human liver carcinoma lysates (IC50 = 19 μM). Our study demonstrates that one can rationally design a stabilized, specific peptidyl protease inhibitor by strategic placement of a thioamide and reaffirms the place of this single-atom modification in the toolbox of peptide-based rational drug design.


Introduction
In recent decades, there has been an increased interest in the development of peptides as therapeutics and imaging agents. [1][2][3] Peptide-based drugs offer advantages such as high selectivity and potency, low tissue accumulation, relatively predictable metabolism, and safety. Additionally, the ease of obtaining high biological and chemical diversity from standard synthetic procedures makes peptide therapeutics attractive. 1,4 Peptides thus stand out as promising candidates to ll the gap between the two main drug categoriestraditional small-molecule drugs (smaller than 500 Da) and biologics (larger than 5000 Da). 4 Most of the peptides that are currently clinically approved or under active development are targeted for metabolic diseases and cancer. 2 However, despite the numbers of known targets for peptide therapeutics and existing peptide libraries, peptides still display certain disadvantages that hinder them from more easily becoming effective drugs. 1,2 Peptides are subject to rapid proteolysis, oxidation, display short half-lives and fast renal clearance in vivo, as well as low membrane permeability, thereby exhibiting suboptimal pharmacokinetics. 1,2 To address the metabolic stability issues of peptides, modications at protease cleavage sites using techniques such as synthetic substitutions of amino acid sidechains or the peptide backbone have been developed and utilized to increase resistance to proteolysis. 3,[5][6][7][8] Backbone thioamidation is a promising tool that has been shown to improve proteolytic stability of both linear and macrocyclic peptides. [9][10][11][12][13][14][15][16][17][18][19][20] Our laboratory previously demonstrated that a thioamide substitution near the scissile bond of glucagon-like peptide-1 (GLP-1) and gastric inhibitory polypeptide (GIP), two therapeutically relevant peptides for diabetes treatment, signicantly enhances their proteolytic stability against dipeptidyl peptidase 4. 21 Thioamidation of GLP-1 and GIP increased their half-lives up to 750-fold without signicantly compromising their cellular activity; the thioamide GLP-1 analogue was also biologically active in rats and exhibited improved potency for glycemic control compared to its native, all-amide GLP-1 counterpart. 21 Motivated by these results showing thioamide stabilization effects at P2 and P1 positions (positions numbered from the scissile bond by convention), our laboratory developed a uorescence sensor design to systematically study the positional effects of thioamide substitution against different cysteine proteases (papain, cathepsins L, V, K, B, and S) and serine proteases (trypsin, chymotrypsin, and kallikrein). [22][23][24] Intriguingly, we found that thioamide positional effects differ not only between serine proteases and cysteine proteases, but also between members of the same protease family despite their high homology (31-59% sequence identity) and mechanistic similarity. 22,24 We also successfully utilized data from these systematic studies to design a two-site stabilized thioamide peptide specically targeting neuropeptide Y 1receptor expressing MCF-7 breast cancer cells. 22 With the experimental data from these systematic studies, we recently developed a Rosetta machine learning model that accurately classies positional effects of thioamides on proteolysis by these cysteine and serine proteases which can be used to rationally design stabilized peptides for therapeutic and imaging applications. 23 Given this precedent, in this study, we aim to further utilize the strategic incorporation of thioamides to develop stabilized peptides as protease inhibitors, more specically, inhibitors of the cysteine protease cathepsin L (Cts L). Among the 500-600 proteases identied in mouse and human, the cathepsin (Cts) family includes proteases that orchestrate numerous critical physiological processes and are involved in many different diseases such as neurological disorders, cardiovascular diseases, arthritis, obesity, and cancer. 25,26 Cysteine Cts proteases, which comprise 11 members in humans (Cts B, C, F, H, K, L, O, S, V/L2, X, and W), belong to the papain-like cysteine protease family. They have been shown to be upregulated in many cancer types and play critical roles in cancer progression. 27,28 Cts L is an ubiquitously expressed endopeptidase that is uniquely involved in the major histocompatibility complex (MHC) class II processing pathway, 29 prohormone or proneuropeptide processing, 30-32 and autophagy, 33 as well as cardiac homeostasis and signal transduction. [34][35][36][37] Cts L is highly expressed in tumors associated with breast cancer, colorectal cancer, and pancreatic adenocarcinoma. 27,38 Cts L participates in the degradation of epithelial cadherins, transmembrane receptors, and extracellular domains of cell adhesion molecules in cancer cells, thereby disrupting cell adhesion, promoting tumor invasion, and possibly underlying resistance to chemotherapy. 27,39,40 Importantly, Ou et al. recently showed that lysosomal activation of SARS-CoV-2's spike (S) glycoproteins by the host cell's Cts L, but not Cts B, is critical for its cellular entry via endocytosis during infection. 41 These researchers showed that treatment with Cts L inhibitor SID 26681509 decreased SARS-CoV-2 pseudovirus entry into HEK 293/hACE2 cells by more than 76%, highlighting the role of Cts L in lysosomal priming of the virus upon entry. 41 There is also evidence for elevated Cts L circulating level in COVID-19 patients. 42 This is signicant as Cts L inhibitors have now been identied as promising therapeutic agents to inhibit SARS-CoV-2 for potential treatment of COVID-19. [41][42][43][44] It has been proposed that a protease inhibitor cocktail composed of a Cts L-specic inhibitor as well as serine protease inhibitors could be a safe and novel treatment for COVID-19 patients. 45 Although it is desirable to develop Cts L inhibitors, there are currently no specic inhibitors for Cts L that have advanced to clinical trials as it is challenging to obtain selectivity against closely related Cts family members. 46 Many of the known Cts L inhibitors, which are mostly small molecules, resemble its physiological substrate and oen have electrophilic "warheads" (e.g. epoxide ring, acyloxymethyl ketone, aziridine, vinylsufonate, nitrile, or thiosemicarbazone) that are strategically placed to trap the catalytic Cys 25 residue of Cts L. 46 This follows a logic common to the development of covalent protease inhibitors, wherein a good protease substrate is converted into an inhibitor by strategic incorporation of such warheads. [46][47][48] Inspired by this principle, we demonstrate herein that good peptidyl substrates of Cts L, designed by combining knowledge about substrate sequence specicity from previous positional scanning with our protease sensor studies, can be converted into good inhibitors of Cts L by a single thioamide substitution to the peptide backbone. Unlike the warhead strategy, thioamide modication only renders the substrate inert to proteolysis and does not result in covalent inhibition. There are many advantages to our strategy as concerns about the use of covalent enzyme inhibitors linger in spite of several successes with the aforementioned warhead approach. 46,49,50 With our thioamidation approach, we hope to potentially overcome challenges with selectivity and off-target effects of Cts inhibitors that are normally encountered with small molecule protease inhibitors. 46 An inhibitor with high specicity for a single Cts is a powerful tool compound for studying its role in health and disease and can serve as a therapeutic lead where such specicity is necessary to avoid undesirable side effects. In this study, we examine the stability of several thioamide peptide scaffolds toward Cts proteolysis and identify one peptide that shows resistance to Cts L, Cts V, Cts K, Cts S, and Cts B while inhibiting only Cts L. We also show that this stabilized thioamide peptide can inhibit Cts L in human hepatocellular liver carcinoma (HepG2) whole cell lysate. To our knowledge, this is the highest affinity thioamide-based protease inhibitor to date. Our studies show the potential of utilizing thioamides to stabilize and convert good peptidyl substrates into specic protease inhibitors.

Results and discussion
Designing and examining thioamide peptide inhibitors of cathepsin using a uorescence protease sensor system Previously, our laboratory has designed a uorescent protease sensor system that capitalizes on the uorophore quenching property of thioamides to monitor real-time protease activity. 22,24,51,52 For our rst-generation sensors, a thioamide and a uorophore are placed on opposite sides of the scissile bond, thereby leading to a turn on of uorescence upon cleavage. 51,52 Building upon this design, we generated a series of peptides with 7-methoxycoumarin-4-yl-alanine (Mcm; m) at both the N-terminus and C-terminus to systematically study thioamide positional effects on proteolysis of cysteine 24 and serine 22 protease substrates. Once the doubly-labeled peptide is cleaved, there will be a turn on in uorescence as one of the uorophores will be separated from the thioamide, regardless of the placement of the thioamide. 22,24 This allows real-time monitoring of proteolysis kinetics. From our previous systematic studies with cysteine proteases (Cts V, Cts K, Cts S, Cts B, Cts L, and papain), we learned that thioamide substitution at the P1 position signicantly slowed the proteolysis rates of the generic mLLKAAAm substrate by Cts V, Cts K, Cts S, and Cts L, but not signicantly by Cts B and papain. 23,24 By convention, amino acids N-terminal to the scissile bond are denoted PX positions (e.g. P1, P2, P3; non-primed positions), while those C-terminal to the scissile bond are considered PX 0 positions (e.g. P1 0 , P2 0 , P3 0 ; primed positions). Interestingly, we found that the P1 thioamide peptide, mLLK S AAAm (K S P1 ), not only showed the highest level of protease resistance to Cts L, but also served as a potent inhibitor of Cts L (K I ¼ 0.87 mM; Fig. S13 and Table S12 †). Although this preliminary inhibition data was exciting, the K S P1 peptide would not be very stable in vivo since it could still be efficiently cleaved by other cysteine proteases (papain, Cts B, Cts V, Cts K, and Cts S) 23,24 as the sequence of this peptide was designed to be generic. Motivated by these results, we thus envisioned advancing this approach to design and scan for a sequence-optimized, thioamide-containing peptide specic inhibitor to Cts L, yet being stabilized in the presence of other closely related cathepsins without inhibiting them.
The rationale of our peptide design entails two main steps: (1) to design all-amide peptides that are good substrates of Cts L, then (2) to turn those substrates into stabilized peptides inhibiting Cts L by strategic placement of a single thioamide. In this study, utilizing the peptide sensor design from our previous studies, our peptide inhibitor candidates contained two coumarins (m residues) at their termini, allowing for quick initial identication of peptides that showed resistance to proteolysis by cathepsins via steady-state protease assays ( Fig. 1). 22,24 To design the amino acid sequences for initial scanning, the primed positions of these peptides were kept generic and consistent with our previous studies by retaining alanine at the P1 0 , P2 0 , and P3 0 positions. For the non-primed positions, sequence design was guided by a comprehensive substrate proling study using a synthetic library of 160 000 uorogenic tetrapeptides by Choe et al. (Fig. S1 †). 53 With the knowledge of different amino acid preferences by different cathepsins, we identied peptide sequences that might be specic to Cts L. At the P1 position, all human cathepsins prefer basic residues, so arginine and lysine were clearly the choice for this position, with arginine being preferred by Cts L. 53 P2 is considered the major determinant for substrate specicity of Cts L that differentiates it from Cts K, Cts S, and Cts B, as Cts L has a unique preference for aromatic residues (phenylalanine, tryptophan, tyrosine) at this position. 53,54 Similar preference for aromatic residues at P2 position is only observed in Cts V, which is most closely related to Cts L by sequence identity (78% sequence identity). 53 As Cts V favors tryptophan and tyrosine over phenylalanine, phenylalanine seemed to be the best choice for P2. For P4, Cts L shows a preference for histidine, prompting us to choose histidine at this position. At P3, however, Cts L has less well dened specicity, but displays some preference for basic residues as well as a few aliphatic amino acids. 53 Since we already included a positively charged residue at the P1 position, we decided to incorporate an aliphatic amino acid at P3 to reduce the potential for multiple Cts L cleavage sites. We chose leucine for the P3 position since the data from Choe et al. suggested that Cts L prefers leucine at P3 among the aliphatic amino acids (Fig. S1 †). 53 Regarding placement of the thioamide, the P1 position was chosen based on our previous systematic studies with the cysteine proteases that showed P1 thioamide peptide K S P1 gave the highest level of protease resistance. 23,24 Using this rationale, the rst series of peptides synthesized via solid-phase peptide synthesis for initial scanning with steady state protease assays were: mHLFRAAAm (R 3A ), mHLFKAAAm (K 3A ), and their P1 thioamide analogs (R S 3A and K S 3A ) (Fig. 1A). The thioamide position is denoted as a superscript "S" in the peptide sequences.
To validate our design, we needed to rst conrm whether the all-amide peptides were good substrates of the proteases before proving that the thioamide substitution could transform them into stabilized peptide inhibitors. For ease of comparison between thioamide positions and protease, raw uorescence measurements were normalized and are presented in Fig. 2 (primary data are shown in ESI, Fig. S2-S6 †). Initial rates of proteolysis were determined for each cleavage reaction (Table  1). High performance liquid chromatography (HPLC) and matrix-assisted laser desorption ionization mass spectrometry (MALDI MS) were used to conrm the cleavage sites in all assays (Tables S6-S11 and Fig. S7-S12; † cleavage sites summarized in Table S5 †). In the absence of protease, no signicant changes in uorescence intensities nor degradation of the peptides in the assay buffers were observed. Both of the all-amide peptides K 3A and R 3A were recognized and efficiently cleaved by all ve proteases, conrming that these were indeed good substrates of Cts L ( Fig. 2 and Table 1). It is worth noting that the all-amide peptides K 3A and R 3A were both cleaved at the P1 position by all of the proteases, consistent with the fact that these ve cysteine cathepsin proteases have high preferences for recognizing and cleaving their substrates at basic residues (Table  S5 †). Interestingly, for the all-amide peptides K 3A and R 3A , cleavage sites other than the expected P1 position was also observed with Cts L, Cts V, and Cts K, while the peptides were cleaved at only the P1 position by Cts S and Cts B (Tables S6, S8 and Fig. S7, S9 †). This likely reects the fact that Cts L, Cts V, and Cts K are most closely related in sequence identity, resulting in similar preferences for substrate specicity. We then performed the assays with the thioamide analogs, where P1 thioamide stabilization was observed with Cts L, Cts V, Cts K, and Cts S ( Fig. 2 and Table 1). This supports our choice of thioamide placement at the P1 position and is consistent with our previous ndings reported in Liu et al. and Giannakoulias et al., where we observed that P1 thioamides retarded proteolysis by Cts V, Cts K, Cts S, and Cts L, but not Cts B. 23,24 Substituting the thioamide at the P1 position thus not only stabilized the P1 position, but also resulted in multiple-site stabilization effects in the cases with Cts L, Cts V, and Cts K (Tables S7, S9 and Fig. S8, S10 †). We have previously observed similar multiple-site stabilization with thioamide substrates of serine proteases, which we were able to exploit to stabilize cancer cell imaging peptides at two positions with a single thioamide modication. 22 Overall, having a thioamide at the P1 position here rendered the K S 3A and R S 3A peptides completely resistant to proteolysis by Cts V, Cts K, and Cts S, while signicantly slowing the rate of proteolysis by Cts L ( Table 1). The only exception to this P1 thioamide effect was with Cts B, where the K S 3A and R S 3A peptides were cleaved at the penultimate C-terminal alanine residue as indicated by the slashes -mHLFK S AA/Am or mHLFR S AA/Am (Tables S7, S9 and Fig. S8, S10 †). This pattern of cleavage by Cts B aligns well with the fact that Cts B is known to be both an endopeptidase and a carboxydipeptidase (exopeptidase). 28 The K S 3A and R S 3A peptides were also slowly cleaved by Cts L at the same position. We therefore postulated that a truncated version of this peptide, mHLFR S Am (R S 1A ; Fig. 1B), would eliminate this cleavage site by Cts L and Cts B and stabilize the peptide. As expected, the all-amide version of this shorter peptide (R 1A ; mHLFRAm) was recognized and cleaved by all ve cathepsins -Cts L, Cts V, Cts K, Cts S, and Cts B (Fig. 2, S11 and Tables 1, S10) while the corresponding thiopeptide R S 1A was le intact (Fig. 2, S12 and Tables 1, S11). Since preceding literature and our inhibition assays with the K 3A and R 3A peptides suggested that the arginine substrates have higher affinity for Cts L, we proceeded to investigate in depth the R S 1A peptide instead of its lysine analog (mHLFK S Am) as discussed in the next section. 53

Investigating inhibitory effects with cathepsin proteases
Inhibition assays with Cts L were performed with all of the peptides from the rst series (K 3A and R 3A peptides and their P1 thiopeptides) as well as the R S 1A peptide (Tables 2, S13-S18 and   where Z is benzyl and AMC is 7-aminomethylcoumarin), which is a commercial uorogenic substrate of Cts L, was used.
As the Z-FR-AMC substrate was cleaved by Cts L, its turn-on uorescence was monitored at 460 nm with an excitation wavelength of 380 nm, which is different from the wavelength used for monitoring potential cleavages of our peptides with the m residues (l excitation ¼ 325 nm; l emission ¼ 390 nm), allowing them to be separately monitored without interference. The allamide peptides (K 3A and R 3A ) showed some inhibitory effects, which was expected since these substrates compete for the active site of Cts L ( Table 2). The corresponding thioamide peptides, K S 3A and R S 3A , were very good inhibitors of Cts L, with respective K I values of 0.60 AE 0.15 mM and 0.52 AE 0.12 mM ( Table  2). The R S 1A peptide (mHLFR S Am), which showed resistance to proteolysis by all ve cathepsins, was also a good inhibitor of Cts L with a K I value of 1.11 AE 0.22 mM. Although the R S 3A and K S 3A exerted slightly better inhibitory effects than the truncated peptide R S 1A , they are not as ideal because we established in the steady-state protease assays that they could be cleaved by Cts L and by Cts B at the C-terminus. Lastly, the role of the two coumarins was examined with the coumarin-free peptide HLFR S A (R S 1A * ). Although this peptide showed resistance to cleavage by Cts L (Fig. S20 †), it was a signicantly weaker inhibitor of Cts L (K I ¼ 13.23 AE 6.89 mM), indicating an important role for the coumarins in binding. The nding that the thioamide peptides could serve as potent inhibitors of Cts L was exciting because our previous investigations of thioamidestabilized protease substrates had found them to be only fairly weak inhibitors, implying that the thioamide primarily acted to disrupt binding to the protease. 21,22,24 Indeed, earlier investigations of thioamide effects on proteolysis had found similar results for di-and tripeptides. 14,15,20,55 Thus, we wished to further investigate the mechanism of inhibition.
From initial evaluation of the kinetic parameters obtained by tting data to a Michaelis-Menten model (details are shown in the ESI; Tables S13-S18 †), there was generally a decrease in V max , but either a minor increase or no signicant change in K M as the concentration of the inhibitors was increased. This eliminates the possibility of these peptides as acting as purely competitive inhibitors or uncompetitive inhibitors, suggesting that they are likely mixed-type inhibitors of Cts L based on the traditional categorization of inhibitors. This can be easily visualized in Lineweaver-Burke plots (Fig. S14-S19 †), conrming the high likelihood of a mixed-type mechanism of inhibition, as shown in Fig. 3 for the R S 1A peptide. To further evaluate the mechanism of inhibition and to obtain the K I values, kinetics data were tted using a non-linear regression analysis with the mixed inhibition model that allows us to determine the K I and the mechanism of inhibition using the output "alpha" (a) using GraphPad Prism soware. 56 All of the a values are consistently between 1-2, conrming these peptides are likely mixed-type inhibitors, with element of competitive inhibition since a value of a > 1 suggests tighter binding to the free enzyme ( Table 2). 57 Interestingly, mixed inhibitors of Cts L have been previously shown to be promising antiviral candidates. A study with a high-throughput screening of 5000 molecules discovered a small-molecule inhibitor of Cts L (5705213) with a mixed inhibition mechanism that can inhibit Cts L-mediated cleavage of the viral glycoproteins derived from all four viruses -SARS-CoV, Ebola, Hendra, and Nipah viruses, a process that is essential for entry into host cells. 58 To serve as useful specic inhibitors of Cts L, the thioamide peptides must also be inert to cleavage by other proteases that may be present in vivo while not inhibiting them. Since the sequence-optimized R S 1A peptide herein showed resistance to proteolysis by all ve proteases, we then assessed the specicity of inhibition by the R S 1A peptide by determining whether it could also effectively inhibit Cts V, Cts K, Cts S, and Cts B using assays similar to the Z-FR-AMC used with Cts L. The R S 1A peptide exhibited a 26-fold increase in K I and is a weak mixedinhibitor of Cts V, with a K I of 26.22 AE 8.42 mM (Fig. S21 and Table S19 †). No signicant differences in the values of k cat and K M were observed for proteolysis of Z-Leu-Arg-AMC (Z-LR-AMC) by Cts K or Cts S in the presence of >30 mM concentrations of R S 1A peptide (Fig. S22, S23 and Tables S20, S21 †). Similarly, essentially no differences in k cat and K M were observed for Cts B proteolysis of the Z-Arg-Arg-AMC (Z-RR-AMC) substrate in the presence of up to 50 mM R S 1A peptide with Cts B (Fig. S24 and a Data was obtained by tting to the mixed inhibition model that allows us to simultaneously determine the K I and the mechanism of inhibition using the output "alpha" (a) in GraphPad Prism 8 soware. 56,57 Detailed analysis are described in the ESI.  S22 †). In summary, in addition to acting as a potent inhibitor of Cts L (K I ¼ 1.11 AE 0.22 mM), the R S 1A peptide is $25fold selective against other Cts family members tested, as it only weakly inhibits the closely related Cts V (78% sequence identity with Cts L) and shows little to no signicant inhibitory effects with Cts K, Cts S, and Cts B (58%, 55%, and 26% respective sequence identity with Cts L).

Evaluation of cathepsin L inhibition in HepG2 whole cell lysate
Cts L has been considered an appealing target for cancer treatment because its expression has been linked to tumor progression and metastases of different types of cancers. 38,61 In particular, it has been previously shown that increased Cts L expression is associated with worse outcome in hepatocellular carcinoma patients 62 and elevated Cts L activity has been found in malignant liver cancer HepG2 cells. 63 To further validate our R S 1A peptide inhibitor of Cts L, we investigated whether it could effectively inhibit Cts L activity in whole cell lysate from the HepG2 human hepatocellular liver carcinoma cell line. Using a commercially available uorescence based Cts L activity kit, we incubated different doses of R S 1A peptide with HepG2 whole cell lysate. We found that the R S 1A peptide could effectively inhibit uorescent reporter activity in the HepG2 whole cell lysate (IC 50 ¼ 19.3 AE 4.5 mM) (Fig. 4). Signicantly, MALDI MS and HPLC data showed that this peptide's half-life was 28.6 hours, which was approximately 238 times more stable than its all-amide counterpart, R 1A , with a half-life of only 7.2 minutes in HepG2 whole cell lysate (Fig. 4C and S26 †). The fact that this thioamide peptide showed great stability in the presence of other proteases and cellular components in the HepG2 whole cell lysate further corroborated the enhanced stability we previously observed in steady-state protease assays with individual cathepsins (Cts L, V, K, S, and B). Excitingly, our preliminary data showed that R S 1A peptide could also inhibit Cts L in human MDA-MB-231 breast cancer cells overexpressing Cts L (Fig. S47 †). 42,64 These ndings establish exciting precedent for translating R S 1A to in vivo assays to determine the impact of highly specic Cts L inhibition on processes such as cancer cell growth and viral uptake.

Computational modeling
In order to rationalize the specic inhibitory effects of our peptides, we utilized computational modeling to exibly dock the longer peptide R S 3A and the truncated peptide R S 1A with Cts L and the other four cathepsins investigated in this project. Interestingly, exclusively in the Cts L simulations, we observe that the P1 thioamide bond N-H of the R S 3A peptide can interact with His 163 , which is part of the Cts L catalytic triad (Fig. 5A). Hydrogen bonding in this manner would prevent His 163 from efficiently deprotonating Cys 25 , thereby attenuating the proteolytic activity of Cts L and making the R S 3A peptide a good inhibitor. Similarly, with the truncated peptide R S 1A , only with Cts L, did we observe the interaction between the P1 thioamide N-H group of the peptide and His 163 (Fig. 5B). This hydrogen bond would be expected to be stronger for the thioamide than for the amide. 5,65 This may explain why both the R S 3A and R S 1A peptides can effectively inhibit Cts L. Our computational modeling can also be used to reasonably explain our other experimental data. From the steady-state protease assays, we found that the only exception to the P1 thioamide stabilization effect was with Cts B, where the R S 3A peptide was cleaved at the last two C-terminal alanine residues (mHLFR S AA/Am), which is consistent with the fact that Cts B is both an endopeptidase and a carboxydipeptidase. 28 Upon examination of the docked structure of the R S 3A and Cts B, we found that the carboxylic acid of the C-terminal m of the peptide interacts with His 112 on the occluding loop, which is one of the two histidines (His 111 /His 112 or His 110 /His 111 ) known in the literature to anchor the C-terminal carboxylate of substrates to give Cts B its carboxydipeptidase properties (Fig. 5C). 66,67 The truncated peptide R S 1A eliminates this interaction, thus protecting the peptide from proteolysis by Cts B and making it inert to all ve cathepsins L, V, K, S, and B while specically inhibiting Cts L (Fig. 5D).
In an effort to further rationalize why incorporation of a thioamide at the P1 position imbues inhibitory effects for both the longer peptide R S 3A and shorter peptide R S 1A with Cts L and not the other cathepsins, we performed the following two analyses. The rst analysis investigated the change in distances observed between the active site cysteine sulfur and the scissile bond carbonyl carbon upon incorporation of the thioamide. We detected large increases of up to 1.2Å in this distance (placing the active site residue outside the range for nucleophilic attack) for the two Cts L peptides of interest (Table S25 †). Importantly, despite this change in backbone geometry, the key histidine hydrogen bonding interactions were preserved. Our second retrospective analysis utilized unsupervised machine learning (KMeans Clustering) of residue-level energy differences between amide and thioamide peptide complexes from our structural models. Energy feature clustering analysis demonstrated that the Cts L peptides were clustered with each other, but separately from all other clusters (Fig. 6). These data indicate that the changes in energy associated with thioamidation in Cts L complexes are distinct when compared with thioamidation energy changes for the other cathepsins. Taken together, our identication of relevant hydrogen bonding interactions, tolerance of the complexes to the incorporation of a P1 thioamide (change in distance upon removal of constraints), and energy feature clustering identify distinct aspects of the R S 3A and R S 1A complexes with Cts L that can explain the mechanism of their specic inhibition: thioamidation disrupts binding of the peptides to other proteases while it strengthens a hydrogen bonding interaction with Cts L that keeps R S 3A or R S 1A tightly bound.

Conclusions
In summary, we have examined several thioamide peptide scaffolds and identied one peptide, R S 1A , that is not only resistant to proteolysis by all ve cathepsins (Cts L, Cts V, Cts K, Cts S, and Cts B), but is also a potent, specic inhibitor of Cts L. This peptide can reversibly inhibit Cts L without degradation in HepG2 liver cancer cell lysate and shows promising activity in MDA-MB-231 breast cancer cells. Such a peptide is desirable since peptide-based agents, especially those targeting proteases, are oen subject to degradation in vivo. Furthermore, reversible inhibitors like this could potentially address the safety concerns from lack of specicity and potential elicitation of immune responses with irreversible, covalent inhibitors. 49,50 While the selectivity against other cathepsins is not as high as some previously reported peptidomimetics (primarily covalent inhibitors), 45 this has not been our focus here. Rather, we sought to demonstrate that one can rationally design a potent reversible protease inhibitor by strategic modication of amino acid sidechains and thioamide position based on sensor data from our own work and others. More detailed mechanistic  studies, as well as further optimization of this peptide for higher affinity and selectivity will be pursued and reported subsequently. Our studies show the potential of utilizing thioamides as stabilized peptide inhibitors and reaffirm the value of thioamides in the peptide drug design toolbox. In future studies, we will further optimize the thioamide peptide scaffolds by exploring substitutions with unnatural amino acids as well as more carefully examining the role of the N-and Cterminal coumarin groups, removal of which led to a 13-fold decrease in K I . More rigorous biological studies, including assessment of cell permeability, are also warranted to more fully assess the utility of these compounds for in vivo studies of Cts L and possible therapeutic advancement. Given our previous success in machine learning approach to predict thioamide effects and the existing database of sequence effects on cathepsin activity, we may be able to computationally design peptide-based inhibitors for cathepsins as well as for other targets. 23 Taken together these approaches can form a paradigm for developing thioamide-stabilized peptides as enzyme inhibitors.

Experimental
Protease assays with sensor peptides were reacted with 40.9 nM Cts B. All assays were performed in an assay buffer consisted of 100 mM sodium acetate, 100 mM NaCl, 1 mM EDTA, 5 mM DTT, and pH 5.5 in a 96-well plate at 27 C. The peptide inhibitors were pre-incubated with the appropriate proteases in the assay buffer for 10 min to ensure full interactions prior being added to the uorogenic substrates. The uorescence of the reaction was monitored as a function of time at 460 nm with an excitation wavelength of 380 nm by a Tecan M1000 plate reader. Each assay was done in triplicates to ensure reproducibility. Details of the analysis for these assays are outlined in the ESI. † Cathepsin L activity assay with HepG2 whole cell lysate Cts L activity in human hepatocellular carcinoma HepG2 whole cell lysate (200 mg at 2.5 mg mL À1 ; ab166833; Abcam, Cambridge, MA) was evaluated using a uorometric Cathepsin L Activity Assay Kit (ab65306; Abcam, Cambridge, MA, USA) following the manufacturer's protocols. Briey, in each well of a 96-well plate, 50 mL of the HepG2 cell lysate diluted in CL buffer (to a nal concentration of 0.05 mg mL À1 HepG2) was incubated with 50 mL of CL buffer without (Control) or with the presence of different peptide inhibitor R S 1A concentrations (15 mM, 20 mM, 30 mM, 50 mM, 70 mM, 80 mM, 100 mM, and 120 mM). The cell lysate and the peptide inhibitor were incubated at room temperature for 10 min. A total of 2 mL of 10 mM CL substrate Ac-FR-AFC substrate (to a nal concentration of 200 mM) was then added to each well, except the Lysate Background Control wells. Different concentrations (56 nM, 560 nM, and 1 mM) of SID 26681509, a known Cts L inhibitor, were used as positive controls. The samples were mixed; the plate was sealed to avoid evaporation and incubated at 37 C for 1 h. The uorescence of each sample was measured at 505 nm with an excitation wavelength of 400 nm on the Tecan plate reader. More details of the assay and data tting are included in the ESI. † Stability assays of the peptides in HepG2 whole cell lysate are also detailed in the ESI. †

Computational modeling
In order to simulate the protease/peptide complexes from this study, the structure of the papain protease (PDB ID: 1BP4) which contains a peptide-like covalent inhibitor 68 was used as a template in order to provide a reasonable starting structure for docking. Manual docking was performed by replacing the native covalent inhibitor with the WHLFRAAAW peptide which was prepared using PyRosetta. 69 The cathepsin proteases of interest, Cts B (PDB ID 1GMY), 70 Cts K (PDB ID 1BGO), 71 Cts L (PDB ID 3HHA), 72 Cts S (PDB ID 1MS6), 73 and Cts V (PDB ID 1FH0), 74 were aligned to the manually docked papain complex using PyMOL. The cathepsin protease WHLFRAAAW starting complexes were formally docked by performing the FlexPep-Dock protocol in Rosetta in order to optimize the binding interaction between the proteases and peptides of interest. 75 The tryptophan residues in WHLFRAAAW were mutated to 7methoxycoumarinyl alanine (m) residues using the MutateResidue tool in PyRosetta toolbox with a params and rotamer library generated previously. 23 Next, a constrained FastRelax was performed in PyRosetta in order to accommodate the newly mutated 7-methoxycoumarinyl alanine residues. A at harmonic constraint was used to maintain proximity of the scissile bond to the active site cysteine residue. Thioamides were introduced into the relaxed complexes through patches written previously. 23,76 The thioamide containing peptides were then simulated with ve independent local relax trajectories without any constraints.

Machine learning
Unsupervised machine learning was performed by clustering energy features from PyRosetta modeling with scikit-learn. 77 Specically, score differences (termed deltas) between the residue total energies (energy in thioamide peptide complex minus energy in all-amide peptide complex) of the three residues of the protease catalytic triad as well as the P1 and P1 0 residues of the peptide were computed from all of our Flex-PepDock models. These ve energy score deltas were reduced into three dimensions with Principal Component Analysis. 77 The three principal component axes were then clustered with the KMeans algorithm utilizing four clusters which was derived by maximizing the Silhouette heuristic. 78 The threedimensional data were plotted and visualized with matplotlib. 79

Data availability
Kinetic and inhibition data with associated tting as well as structural models of the peptides in complex with proteases are available as ESI. A key le is provided with descriptions of each data le.

Conflicts of interest
There are no conicts to declare.