Cysteine containing dipeptides show metal specificity that matches the composition of seawater

Bioinformatic Calculations PDB entries, up to October 2014, that contained coordinated mononuclear metal ions and iron-sulphur clusters were downloaded and analyzed with an in-house script that determined the frequency of residues in positions ±1 relative to the ligating cysteine. In total, 13,600 PDB entries were analyzed, including 1,151 iron proteins, 538 cobalt proteins, 384 nickel proteins, 964 copper proteins, 9,529 zinc proteins, 446 [2Fe-2S] proteins, and 588 [4Fe-4S] proteins.

the region 2 to 4 Å.The cut-off was fixed at 10 Å. Interaction energies between metal ions and ligands are well described by the usage of the Møller-Plesset perturbation theory (MP2) used in these calculations.The metal ion-ligand energy for the metal ion-methanethiolate system was then calculated as follows 19 : The obtained functions were then fit to the following potential function: where Q i and q j were the calculated Merz-Kollman charges for the metal dication and the coordinated sulfur (table S1) at the distance at which the well of the potential energy function was found (Figure S5).A 1 and B 1 were equal to 793.3 and 25.01, respectively, as previously described. 20All calculated parameters are available as a separately downloadable supplementary files.

Molecular Dynamics
The cysteine coordinates of Clostridium pasteurianum rubredoxin (PDB ID: 1IRO) along with the coordinated iron centre were extracted.The coordinates were then manipulated with Avogadro to build the other metallocomplexes; i.e. to add different peptide ligands.All of the complexes were solvated in water, neutralized and equilibrated for 20 ps.The complexes were heated to 298 K in a stepwise manner.A constant pressure of 1 atm was used to have a NPT ensemble.The calculation interval for the equations of motion was 2 fs.Long-range electrostatic interactions were calculated using the Ewald approximation and periodic boxes (PBC).The SHAKE 21 procedure was employed to constrain hydrogen atoms.Non-bonded interactions were up to a complete cut-off of 10 Å.The Ewald sum was computed using the Particle-Mesh Ewald (PME) 22 .The Langevin algorithm, as implemented in NAMD, 23 was also used.Molecular Dynamics (MDs) ran for 10 ns.Analysis of the trajectories was performed using VMD. 24MDs were performed with a customized version of Charmm27 Force Field 25- 27 topology and parameter files obtained by the DFT and MP2 calculations described above.The Solvent Accessible Surface Area (SASA) was calculated with VMD, and PDB files were manipulated with UCSF Chimera. 13lecular dynamics data were analysed with an in-house prepared R package MoDyGliAni (Molecular Dynamics Global Analysis) available as a separately downloadable supplementary file and at http://smansy.org/modygliani.MoDyGliAni works on a set of tab separated ASCII files that are the result of one or more MD runs.Input files are in the format time step, energy (kcal/mol), RMSD (Å).
MoDyGliAni first attempts at the RMSD curve fitting with a double negative exponential function (eq 2.1).If the RMSD distribution is not fit by the double exponential function, MoDyGliAni then attempts fitting with a negative exponential function (eq 2.2).If the RMSD distribution is not fit by the negative exponential function, the user must define the time threshold.The two exponential functions are: where k, λ 1 and λ 2 are constants.This process is iterated for all the input files provided.Then, among the time constants (τ) provided by the fitting, MoDyGliAni searches for the longest time constant (τ s ) to ensure that comparisons are made between systems in the production phase.The potential energy surface (PES) and RMSD are considered to be in the equilibration phase if PES is < 5τ, otherwise the PES is considered in the production phase.MoDyGliAni provides as an output a histogram of comparison of the <RMSD> and <E internal > for each complex trajectory given as input in the production phase relative to the slowest τ; i.e. ≥ 5τ s .Parameters of the histogram are delivered in csv format.
MoDyGliAni also provides the complete charts of the RMSD(t), E internal (t), histograms of RMSD and of E internal .<RMSD> and <E internal > are shown in Figure 3 and Figure S8.

Materials
All reagents were from Sigma-Aldrich and used without any further purification.Deionized Milli-Q (Millipore) purified water was distilled under nitrogen flow to deoxygenate the solvent.Ligand solutions were obtained under controlled nitrogen atmosphere by using a Schlenk line and Schlenk glassware and transferred to anaerobic sealed Hellma quartz cuvettes with a septum.In situ synthesis of peptido-metal complexes was obtained by injecting each metal ion solution through the cuvette septum with Hamilton gastight syringes.

Solid phase peptide synthesis
The synthesis of C-and N-blocked peptides was performed according to standard Fmoc-based SPPS procedures. 28N,N-dimethyl formamide (DMF) was used as the solvent and Rink acid-labile (hydroxymethyl)polystyrene resin was used as the starting polymeric support.Fmoc-protected aminoacids were used as building blocks.Peptide elongation was obtained by Fmoc-deprotection of the residue anchored to the resin and Fmoc-AA-OH coupling.Fmoc-deprotection was obtained by washing the mixture with 20% (v/v) solution of piperidine in DMF.For each coupling, an excess (Fmoc-AA-OH: anchored AA, 3:1) of the Fmoc-α-amino acid derivative was added to the resin.Apart from Fmoc-Cys(Trt)-OH, Fmoc-α-amino acid derivatives were activated with a mixture of hydroxyl-benzotriazole (HOBt), N,N,N′,N′-tetramethyl-O-(benzotriazol-1-yl)uronium tetrafluoborate (TBTU), and N,Ndiisopropylethyl amine (DIPEA).Fmoc-Cys(Trt)OH was activated with a N,N'diisopropylcarbodiimide (DIC)/HOBt mixture.At the end of the synthesis, the last Fmoc-protecting group was removed, and the acetylation of the N-terminus was performed by shaking with 25% acetic anhydride in DMF in the presence of DIPEA.The blocked-peptides were cleaved from the resin and deprotected by treatment with a solution of trifluoroacetic acid (TFA):H 2 O:triisopropyl silane (TIS):1,2-ethanedithiol (EDT) (volume ratio 37:1:1:1) for 2 h.The volume was reduced under nitrogen atmosphere to avoid cysteinyl-thiol oxidation, and the product was precipitated with a cold solution of diethyl ether followed by washing cycles with diethyl ether or extracted three times with 20% acetic acid/chloroform and finally dried under inert atmosphere.Peptides were confirmed by mass spectrometry.

UV-Visible spectroscopy
UV-Vis absorption spectra of freshly prepared solutions of peptido-metal complexes were recorded with an Agilent 8453 UV-Vis diode array spectrophotometer with an integration time of 0.5 s and an interval of 1 nm.Baseline subtraction was made at 900 nm.

Saturation binding assay
As a general procedure for iron, cobalt, and nickel ions, aliquots of 25 mM metal salt solution (FeCl 2 , CoCl 2 •6H 2 O, NiSO 4 •7H 2 O , respectively) were injected into the cuvette to titrate 1 mL of an aqueous solution containing 2.5 mM ligand (Ac-(AA) x -NH 2 , x = 1 or 2) at pH 8.7.UV-Vis spectra were collected upon each addition.Absorbance values at a fixed wavelength (380 nm, 750 nm, and 535 nm for iron, cobalt, and nickel ion titrations, respectively) were monitored until no changes were observed.

Competition binding assay
To overcome the spectroscopic silence of Zn(II), competitive binding experiments involving a preformed peptido-Co(II) complex, with the characteristic absorbance at 750 nm, was exploited.A decrease in the absorbance at the fixed wavelength was observed upon titration of the spectroscopic probe with 25 mM ZnSO 4 .The concentration of cobalt added to each peptide before the competition assay was coincident with the K d value of that complex.UV-Vis spectra were collected upon each addition.Titrations continued until no changes in absorbance at 750 nm were observed.

Determination of K d
GraphPad Prism v. 6.00 (GraphPad Software, La Jolla California USA) for Windows was used to calculate the K d values.For Fe(II), Co(II), and Ni(II) peptido-complexes, the K d values were calculated by fitting the absorbance data to the equation: where B max was the absorbance reached at saturation, x the concentration, and h the Hill slope.The K d values for zinc(II) complexes were calculated by fitting to a revised Cheng-Prusoff equation, as previously described. 29able S1 Table S3.Average Solvent Accessible Area Surface (SASA) for the whole trajectory of all the complexes analyzed by MD.

Figure S1 .
Figure S1.Protein ligand preference.Structures from the protein data bank were analyzed to determine the frequency of ligands for each metal centre.(Top) Analysis of iron-sulphur clusters, including 446 PDB entries of [2Fe-2S] proteins, 588 of [4Fe-4S] proteins, and 1151 of mononuclear iron proteins (total = 2185 PDB entries) shows that polynuclear iron-sulphur clusters prefer ligation by cysteine.(Bottom) 1151 PDB entries of iron proteins, 538 structures of cobalt proteins, 384 of nickel proteins, 964 of copper proteins, and 9529 of zinc proteins were analyzed resulting in a data set of 12566 PDB entries.Mononuclear metal coordination of clusters and Fe, Co, Ni, Cu and Zn occurs more frequently through cysteine, aspartate, glutamate, and histidine (not ordered by frequency) ligands, even though other residues can occasionally contribute.

Figure S2 .
Figure S2.The probability of the 20 amino acids to occupy the -1 and +1 positions in the primary sequence with respect to a M 2+ coordinated cysteine.The probability of Val at the -1 position was ~0.34 and ~0.08 for iron and zinc ions, respectively.The probability of a Gly at the +1 position was ~0.33, ~0.28, and ~0.14 for iron, nickel, and zinc ions, respectively.

Figure S3 .
FigureS3.DFT test on the [(CH 3 S) 4 Fe] 2− complex.Geometry optimization results in a tetrahedral geometry for all of the performed DFT tests (top panel).The point energy per optimization step for each of the DFT calculation runs is shown (middle panel) and compared (bottom panel).Only a slight difference in energies was observed, i.e. -1.8757e+003, -1.8752e+003, and -1.8760e+003 au for B3LYP, PBE0, and PW91, respectively.In other words, the difference in energy was by 0.016% for B3LYP with respect to PW91 and by 0.042% with respect to PBE0.Additionally, only 38 steps were taken with B3LYP versus 64 steps for PBE0 and 81 steps for PW91.Therefore, the speed of calculation improved more than 2-fold for B3LYP with respect to PW91.

Figure S4 .
Figure S4.DFT test on the [(CH 3 S) 4 Zn] 2− complex.Geometry optimization results in a tetrahedral geometry for all of the performed DFT tests (top panel).The point energy per optimization step for each of the DFT calculation runs is shown (middle panel) and compared (bottom panel).The resulting energies were -1.8181e+003 au for B3LYP and -1.8185e+003 au for PW91 (a difference of 0.022%).58 steps were taken for B3LYP and 60 for PW91.

Figure S6 .
Figure S6.Superposition of calculated structures and crystallographic coordinates of metal substituted rubredoxins.30Only crystallographic coordinates are shown on the left: tube representation is for the protein backbone, blue dots are for cysteine sidechain sulfurs.On the right, the superimposed structures are shown: blue tube representation is for the protein backbone, while the ball and stick portion of the figure represents carbon, sulfur, and metal ions from the DFT optimized geometries.Metal centres are always shown as spheres: orange is for iron, green for cobalt, grey for nickel, and magenta for zinc.PDB entries are 1IRO for iron rubredoxin (a), 1R0H for cobalt-substituted rubredoxin (b), 1R0J for nickel-substituted rubredoxin (c), and 1IRN for zinc-substituted rubredoxin (d).RMSDs are 0.256, 0.252, 0.384, and 0.38 Å, respectively.

Figure S7 .
Figure S7.Charmm27 customized parameters of (a) average force constants calculated by DFT and (b) charge of the metal ions and sulfur ligands by ab initio calculations.

Figure S8 .
Figure S8.Average RMSD from MD trajectories of the complexes: (a) Cys-Xxx and (b) Xxx-Cys.Histograms were made with MoDyGliAni.

Figure S9 .
Figure S9.Mass spectra of Cys and the dipeptides.A) High resolution mass spectrum of Ac-(Cys)-NH 2 (blocked Cysteine) in aqueous solution is shown.The peak found at 163.65 m/z is consistent with the value of the isotopic mass calculated for the protonated molecule ([M+1], formula C 5 H 10 N 2 O 2 S, 162.05 Da).

Figure S10 .
Figure S10.Examples of metal ion titrations with the dipeptide Cys-Leu.(Top) Co 2+ -dependent absorbance changes at 750 nm were monitored in the presence of 2.5 mM Cys-Leu.The data were fit to the Hill equation for saturation binding.(Bottom) Zn 2+ -dependent absorbance changes at 750 nm in the presence of 2.5 mM Cys-Leu and 0.06 mM Co 2+ were monitored.The data were fit to the revised Cheng-Prusoff equation for competitive binding.29

Figure S12 .
Figure S12.Scatterplot matrix of the average K d value per each metal ion.The Pearson correlation coefficient is given for each pair of metal ions.For example, 0.75 is the coefficient related to Zn 2+ and Co 2+ , while 0.12 is the correlation coefficient for Zn 2+ and Fe 2+ .The associated scatter plots of K d (mM) are also shown.
Mass spectra were acquired at the Proteomics/MS unit of Cogentech S.c.a.r.l.(Fondazione IFOM-Istituto FIRC di Oncologia Molecolare, Milano).Samples were resuspended in 1 mL of HPLC grade H 2 O and directly infused on a quadrupole Orbitrap Q-Exactive HF mass spectrometer (Thermo Scientific) with H-ESI Ion Max source.Ionization was achieved at a flow rate of 5 µL/min in positive ion mode applying +3.5 kV at the entrance of the capillary.Sheat gas was set at 5 psi, capillary temperature 320 °C, S-Lens RF Level 60, S-Lens Voltage 21, FT Resolution 35000.

Table S2 .
. Fitting parameters of the metal dication-methanethiolate potential energy function as given in equation 1.2 and as shown in Fig S7. ε and r min /2 are given as mapped into the customized Charmm force field files.Experimentally determined K d values.