Fluorogenic Trp(redBODIPY) cyclopeptide targeting keratin 1 for imaging of aggressive carcinomas†

Keratin 1 (KRT1) is overexpressed in squamous carcinomas and associated with aggressive pathologies in breast cancer. Herein we report the design and preparation of the first Trp-based red fluorogenic amino acid, which is synthetically accessible in a few steps and displays excellent photophysical properties, and its application in a minimally-disruptive labelling strategy to prepare a new fluorogenic cyclopeptide for imaging of KRT1+ cells in whole intact tumour tissues.


General procedures for SPPS 1
Peptides were manually synthesized in 10-mL polystyrene syringes fitted with a polyethylene porous disc using common Fmoc-SPPS protocols. Solvents, excess of reagents and soluble byproducts were removed by suction. The Fmoc group was removed with piperidine/DMF (1:4) (1 × 1 min, 2 × 5 min), followed by DMF (x5), and DCM (x5) washes. All syntheses were carried out at room temperature. Peptides bearing fluorescent moieties were always protected from light.
Cleavage from resin: The peptide was then cleaved from the resin using 2% TFA in DCM (5 × 1 min) and washed with DCM (5 × 1 min). The combined filtrates were collected into a 150-ml S4 round bottom flask containing DCM (50 mL) and DIEA (250 µL). Solvent was evaporated under reduced pressure and the remaining residue was dissolved in CH 3 CN/H 2 O 1:1 and lyophilised.
Excess PyOxim was removed from the cyclised product by precipitation using cold water. The precipitate was washed with H 2 O, decanted and dried.
Chromatography in flash silica with gradient elution using 10 -50% dichloromethane in hexane gave the desired product 5 (0.93 g, 98%) as a colourless solid. 1

2-(5-Methylthiophen-2-yl)-1H-pyrrole (4)
The deprotection of the Boc protecting group was performed following the described methodology. 3 A dry, 50-mL, round-bottomed flask equipped with magnetic stirrer was maintained under a nitrogen atmosphere. The protected pyrrole 5 (1 g) was added followed by THF (18 mL) . The mixture was stirred at room temperature and 3 equiv. of NaOCH 3 , in CH 3 OH (6.5 N) were added. The mixture was allowed to stir 1 h and was then diluted with 5 mL of Et 2 O and 5 mL of H 2 O. Separation of the organic layer followed by washing with brine, drying over
The mixture was stirred at room temperature for 3 hours and partitioned in saturated NaHCO 3 aqueous solution (15 mL) and CH 2 Cl 2 (25 mL), the organic extracts were dryed over MgSO 4 and the solvent evaporated. The residue was purified by flash column cromatography with 40-90% CH 2 Cl 2 in hexane to afford the desired ketone 6 as a pale yellow powder (362 mg, 27%  The iodinated BODIPY derivative 3 was obtained by adaptation of a reported syntethic procedure. 5 Ketone 6 (670 mg, 5 mmol) was dissolved in CH 2 Cl 2 (10 mL) and stirred under nitrogen. Then pyrrole 5 (620 mg, 5 mmol, 1,2 equiv.) was added and the resulting solution was cooled to 0 °C in an ice bath, followed by the addition of POCl 3 (0,13 mL, 1,3 equiv.). The solution was stirred at room temperature overnight. Triethylamine (1,4 mL, 10 eq.) was added and the mixture was stirred for 10 min. while being cooled to 0 °C.
Boron trifluoride-diethyl ether (1,5 mL, 12 eq.) was added dropwise and the reaction mixture was than stirred at room temperature overnight.

Identification of the putative binding site
The protein KRT1 has 644 amino acids but there is no available crystal structure for the whole protein. However, two sections (namely coil 1B and coil 2B) have been solved by X-ray crystallography in a heterodimeric form with protein KRT10. 7 Figure S7), meaning that the protein is moving considerably during the simulation. This was largely predictable, since this part of the protein is a 104 amino acid elongated alpha helix structure. Analysis of the secondary structure through the DSSP 10 software revealed that the protein unfolds completely during the simulation, except for two small regions that preserve their helical structure: residues 472-482 and 456-461 ( Figure S8A). On the other hand, we also analysed the sequence conservation of KRT1 (UniProt code P04264), residues 387-496 (coil 2B domain).
We performed Multiple Sequence Analysis (MSA) across species. We used ClustalΩ 11 with default settings. Interestingly, the same regions that stayed folded during the MD were also the more evolutionary conserved regions in the MSA. A Consurf 12,13,14 analysis for the 104 amino acids, to identify functional regions, also pointed at the same region (see Figure S8B).
These analyses lead us to hypothesise that two small structured regions (residues 472-482 and 456-461) might be important for binding. Next, we used MD simulations with mixed aqueous/organic solvents (MDmix) 15,16 to identify binding hot spots and assess the druggability of the section of KRT1 structure that was deemed important for binding (residues 454-484).
The analysis revealed that, in spite of the highly dynamic and solvent-exposed characteristics of a sequence that lacks tertiary structure, it displays 5 different binding hotspots with very high affinity for hydrophobic groups ( Figure S9), consistent with the hypothesis that this region of KRT1 acts as a binding site.

Conformational analysis of the cyclopeptides
The structure of peptide 2 was created in MOE 17 using the Protein Builder panel. The chirality of the Lys residue was inverted using the Builder panel. In order to cyclise the peptide, first a soft restraint was created between the amino of the first residue and the carboxylic carbon of the last one. Minimisation brought these atoms nearby, then the residues were joined with the Protein Builder panel and the systems was minimised again. This was used as the initial conformation for molecular dynamics with Amber. For the labelled molecules 8 and 9, Trp and Lys were replaced by the corresponding BODIPY-containing residues (named KPY and WPY S23 hereafter). Non-standard residues (norLeu, KPY, WPY) were parameterised using the RESP procedure 18 to assign partial charges. Atom types (providing the Lennard-Jones and bonded terms) were manually assigned from the Amber force-field. OFF files for these residues are available upon request. Finally, the tLeap program (part of Amber Tools) was used to embed the cyclic peptides into a cubic TIP3P water box spanning at least 20 Å in each direction and to generate Amber topology and coordinate files. 2000 steps of standard minimisation were carried out. Both the equilibration and production protocols were standard ones, using a Langevin thermostat with a collision frequency of 4 ps-1 and the cut-off for non-bonded interactions was set to 9 Å. All bonds are constrained using SHAKE and an integration step of 2 fs was used. Periodic boundary conditions and Ewald sums (grid spacing of 1 Å) were used to treat long range electrostatic interactions. The systems were heated from 100 K to 300 K in 800 ps in the NVT ensemble, followed by 1 ns of equilibration at 300 K in the NPT ensemble.
Then, we ran a 1 microsecond production run for each molecule, saving coordinates in the netcdf trajectory files. All the simulations were performed with AMBER18 19 adapted for running in graphics processing unit (GPUs) and executed at the Barcelona Supercomputing Centre using NVIDIA V100 GPUs. Trajectory analysis was carried out with the cpptraj module, using the dbscan algorithm to cluster the conformations using the RMS of non-hydrogen atoms as distance metric and an epsilon value of 1.5 Å. The populations of the top 5 clusters are shown in Table S1. Their structures can be obtained upon request from the authors.

Conformational analysis of the protein
Three replicas of MD in water were performed using Amber ff99SB 20 force field for protein KRT1. TP3P box was chosen for solvation of the system, with a volume of 3406570 A 3 . Three Na + ions were necessary to neutralise the system; the system was then minimised and equilibrated at the default temperature of 300 K.

Hot spot identification with MDmix
As the putative binding site (residues 454-484) of the protein preserves its secondary structure, and in order to obtain converged and accurate MDmix results, we applied soft restraints (harmonic restraints with force constant of 0.01kcal/mol·Å 2 ), to the non-hydrogen atoms of the protein, as recommended. 21 As shown in the RMSF and RMSD plots, ( Figure   S10) the protein fluctuates but preserves its original structure. Three replicas of MD in Ethanol (20%) in water were performed for a shorter section (res. 454-484) of the protein, this time using MDmix. 15,16 The method identifies hydrophobic and hydrogen bond donor or acceptor, and these are key points for the most important protein-drug interactions. In fact, we found 5 different pharmacophoric points, along the protein, which were then using for Docking studies.

Sequence analysis
We run Consurf 12,13,14 webserver for the PDB structure, to find functional regions in the protein, using HMMER (Homolog Search Algorithm), using a number of iterations of 5. The algorithm found 150 sequences homologues to the protein over 9821 HMMER hits. The most conserved and thus the functional amino acids are located at the same regions, in the C-terminal of the protein. The MSA was done using Clustal Omega. 11

Binding mode prediction
In order to determine a possible binding mode of the cyclopeptide, we ran docking simulations with rDock. 22 The usefulness of rDock for the specific problem of protein-peptide docking has been assessed very recently, with positive results. 23 For the cavity preparation we used the refence ligand method, as implemented in the rbcavity module. The coordinates of the binding hot spots identified in the previous section where used as reference, with a radius of 10 Å.
This creates the cavity depicted in Figure S11, providing ample space for accommodating the cyclopeptides. As rDock considers the internal degrees of freedom explicitly, but treats ring systems as rigid bodies, we provided multiple structures of each cyclic peptide as input. In particular, we used the centroids of those clusters representing more than 2% of the population observed in the MD simulations. That is 3 structures for 2, and 4 structures for 8 and 9 (see Table S1). We ran 100 independent docking calculations for each input ligand structure and

Spectroscopic characterisation
Spectral properties were measured in a BioTek Cytation 3 spectrophotometer. Absorbance and emission spectra were determined in the range of 400-700 nm (every 1 nm) at the indicated concentrations. Environmental sensitivity was measured by comparing the emission in PBS or in presence of phosphatidylcholine: cholesterol (7:1) liposomes (3.75 mg mL -1 PC).
Fluorescence emission was determined on a 384-well plate using a BioTek Cytation 3 spectrophotometer as indicated (excitation at 520 nm, 530-700 nm emission range, every 1 nm).

Protease stability assays
The chemical stability of the peptides was assessed in the presence of Protease XIV (from  Samples were spun at 300 g for 2 min at room temperature, and the pellets were resuspended in FACS buffer prior to flow cytometry. Fluorescence emission was acquired on a 5L LSR flow