Understanding thioamitide biosynthesis using pathway engineering and untargeted metabolomics†

Thiostreptamide S4 is a thioamitide, a family of promising antitumour ribosomally synthesised and post-translationally modified peptides (RiPPs). The thioamitides are one of the most structurally complex RiPP families, yet very few thioamitide biosynthetic steps have been elucidated, even though the biosynthetic gene clusters (BGCs) of multiple thioamitides have been identified. We hypothesised that engineering the thiostreptamide S4 BGC in a heterologous host could provide insights into its biosynthesis when coupled with untargeted metabolomics and targeted mutations of the precursor peptide. Modified BGCs were constructed, and in-depth metabolomics enabled a detailed understanding of the biosynthetic pathway to thiostreptamide S4, including the identification of a protein critical for amino acid dehydration that has homology to HopA1, an effector protein used by a plant pathogen to aid infection. We use this biosynthetic understanding to bioinformatically identify diverse RiPP-like BGCs, paving the way for future RiPP discovery and engineering.


Standard protocols
Streptomyces genomic DNA extraction was carried out using the salting out procedure (Kieser ref).Polymerase chain reactions (PCRs) of fragments for assemblies were carried out with Herculase II Fusion DNA Polymerase (Agilent) and analytical PCRs were carried out Go Taq G2 Flexi DNA Polymerase (Promega).These followed the manufacturers' protocols and annealing temperatures were based on predicted primer melting temperatures.Plasmid DNA extraction and purification was carried out using the Wizard plus SV Minipreps DNA purification system (Promega) and PCR products were purified using the Illustra GFX PCR DNA and gel band purification kit (GE Healthcare) following manufacturer instructions.All sequencing was carried out on plasmid miniprep templates using a Mix2Seq Kit (Eurofins Genomics).Sequences of all primers are listed in Table S2.

Bacterial transformations and conjugations
Electrocompetent E. coli were prepared using the following protocol.E. coli were inoculated into 10 mL SOB-Mg (2% tryptone, 0.5% yeast extract, 0.58% NaCl, and 0.186% KCl) and grown overnight at 250 rpm, 28 °C.A 1% inoculum of this starter culture was added to ten 250 mL conical flasks, each containing 50 mL SOB-Mg.These were incubated for 3 to 5 hours and at 250 rpm, 28 °C until an OD600 between 0.2 and 0.4 was reached.Cells were harvested by centrifugation at 800 x g for 20 minutes and resuspended in a total of 250 mL 10% glycerol.This was repeated three times, resuspending in totals of 50 mL, 50 mL, and 1 mL successively.This produced electrocompetent cells, which were separated into 50 μL aliquots in microcentrifuge tubes and flash frozen for storage at -80 °C.Electrocompetent E. coli were transformed using the following protocol: Electrocompetent Electronic Supplementary Material (ESI) for Chemical Science.This journal is © The Royal Society of Chemistry 2021 cells were thawed and electroporation cuvettes (2 mm) were cooled on ice. 1 μL of DNA solution was added to each aliquot of cells.This was mixed gently by pipetting and transferred to the electroporation cuvette.The outside of the cuvette was dried, and it was inserted into a Gene Pulser (Bio-Rad) with the pulse generator set to 25 μFD, 2.5 kV, and 200 Ω.The pulse was delivered, and the cuvette was immediately placed on ice.200 μL of LB was added.The cells were transferred back to a microcentrifuge tube and were incubated for an hour at 250 rpm, 37°C.The entire transformation mix was then plated on LB with the appropriate selection.If hygromycin selection was used cells were plated on DN-agar (2.3% Difco TM Nutrient Broth and 2% agar).
E. coli ET12567 carrying the helper plasmid pR9604 4 were used for intergenic conjugations into Streptomyces strains using an established protocol 5 , with a few modifications as described here.E. coli ET12567-pR9604 transformants were selected on LB-agar containing carbenicillin, chloramphenicol, and either kanamycin or hygromycin for pCAP-based or pIJ10257-based plasmids, respectively.A single colony was used to inoculate 10 mL liquid LB containing the same antibiotics, and grown overnight at 37 °C, 250 rpm.A 1% inoculum was used to start a fresh growth in 10 mL of LB containing the same antibiotics.This was grown in the same conditions to an OD600 of 0.4.Cells were washed twice in 2 mL LB and resuspended in 1 mL LB. 5 μL of Streptomyces spores were mixed with 0.5 mL 2xYT (1.6% tryptone (BD Biosciences), 1% yeast extract, and 0.5% NaCl; adjusted to pH 7.4 before autoclaving) and heat-shocked at 50 °C for 10 minutes.0.5 mL of washed E. coli ET12567-pR9604 were added to the heat-shocked spores.This mixture was pelleted by centrifugation at 3000 x g for 5 minutes and resuspended in 0.2 mL water.Cells were plated on SFM with MgCl2 (Fisher) added to a final concentration of 10 mM and incubated at 30 °C overnight.These plates were overlaid with nalidixic acid and kanamycin or hygromycin to select for Streptomyces pCAP-based or pIJ10257-based plasmid exconjugants, respectively.

Yeast transformations
Yeast transformations all followed a modified version a published lithium acetate/polyethylene glycol (LiAc/PEG) protocol 6 .Single colonies of S. cerevisiae VL6-48N were inoculated into 10 mL of liquid YPAD in a 50 mL conical centrifuge tube and grown overnight at 30 °C with shaking at 250 rpm.This starter culture was added to 40 mL liquid YPAD in a 250 mL conical flask and incubated for 4 hours at 250 rpm, 30 °C.These cells were harvested by centrifuging for 5 minutes at 1,789 x g and washed twice in equal volumes of sterile water.Cells were resuspended in 1 mL 0.1 M LiAc, transferred to a microcentrifuge tube and pelleted at 3,000 x g for 15 seconds.These cells were resuspended in 400 μL 0.1 M LiAc to a final volume of 500 μL and then transferred to microcentrifuge tubes as aliquots of 50 μL for single transformations.Prior to a transformation, an aliquot was briefly centrifuged, and the supernatant was removed.The following solutions were then added to the cells, in this order: 240 μL PEG solution (50% PEG 3350), 36 μL 1 M LiAc, 50 μL salmon sperm DNA (Invitrogen; 2 mg mL -1 ; boiled for ten minutes and cooled in ice water), and 34 μL DNA to be assembled.Cells were resuspended by pipetting and incubated at 250 rpm, 30 °C for 30 minutes, which was followed by heat shock at 42 °C for 30 minutes.Cells were pelleted by centrifugation at 3,000 x g for 30 seconds, the supernatant was removed, and cells were resuspended in 200 μL sterile water.This final volume of 220 μL was separated into 200 μL and 20 μL aliquots that were plated on selective media SD+CSM-Trp (0.17% YNB-AA-(NH4)2SO4 (Formedium), 0.5% (NH4)2SO4, 2% glucose, 2% agar, 20 μg mL -1 adenine and 740 μg mL -1 CSM-TRP (Formedium)).Plates were then incubated for 3 days at 30 °C.

Yeast colony screening
To screen yeast colonies by PCR, colonies were picked using a pipette tip, and cells were resuspended in 50 μL 1 M sorbitol (Fisher).2 μL of zymolyase (5 U μL -1 ; Zymo Research) was added to each cell suspension and incubated at 30 °C for 1 hour.Cell suspensions were then boiled for 10 minutes, centrifuged for 15 seconds at 1,000 x g and 1 μL of the supernatant was used as a template for PCR.

Yeast to E. coli plasmid shuttling
To shuttle plasmids from yeast into E. coli, colonies of yeast were grown in 10 mL of liquid SD+CSM-TRP overnight at 250 rpm, 30 °C.Cells were harvested by centrifuging for 5 minutes at 1,789 x g and resuspended in 200 μL 1 M sorbitol plus 2 μL of zymolyase (5 U μL -1 ).Cell suspensions were incubated at 30 °C for 1 hour to produce spheroplasts.Spheroplasts were pelleted at 600 x g for 10 minutes, and the supernatant was aspirated.Plasmid DNA was extracted from the spheroplasts using a standard Wizard miniprep protocol (Promega).1 μL plasmid DNA was then transformed into E. coli by electroporation.Transformations were plated on LB-agar and transformants were selected for with kanamycin.

Gene deletions
To obtain gene deletions in pCAPtsa, PCR targeting was used following the published protocol 7  pCAPtsa was introduced to E. coli BW25113-pIJ790 7,8 by electroporation, and transformants were selected on LB-agar containing chloramphenicol and kanamycin at 30 °C.A single colony was used to make electrocompetent cells using the standard protocol (see Bacterial transformations and conjugations above), with two modifications: the subculturing on the second day was performed with 10 mM arabinose added to the medium, and the 10% glycerol washes were replaced with sterile water to facilitate immediate use, rather than flash freezing.These cells were then transformed with the gene specific disruption cassettes by electroporation.Transformants were selected for on LB-agar containing kanamycin and apramycin at 37 °C.Plasmids were extracted from E. coli colonies, and gene disruptions were first confirmed by PCR using the following primer pairs: ΔtsaA was assessed with SPTsaA and ASPTsaA; ΔtsaC was assessed with SPTsaC and ASPTsaC; ΔtsaD was assessed with SPTsaD and ASPTsaD; ΔtsaE was assessed with SPTsaE and ASPTsaE; ΔtsaF was assessed with SPTsaF and ASPTsaF; ΔtsaG was assessed with SPTsaG and ASPTsaG; ΔtsaH was assessed with SPTsaH and ASPTsaH; ΔtsaI was assessed with SPTsaI and ASPTsaI; ΔtsaJ was assessed with SPTsaJ and ASPTsaJ; ΔtsaK was assessed with SPTsaK and ASPTsaK; ΔtsaMT was assessed with SPTsaMT and ASPTsaMT; ΔtsaL was assessed with SPTsaL and ASPTsaL; ΔtsaMO was assessed with SPTsaMO and ASPTsaMO; Δtsa+1 was assessed with SPTsa+1 and ASPTsa+1; Δtsa+2 was assessed with SPTsa+2 and ASPTsa+2; Δtsa+3 was assessed with SPTsa+3 and ASPTsa+3; and Δtsa-1,-2, and -3 was assessed with SPTsa-1,2,3 and ASPTsa-1,2,3.Additionally, the primers SPaac(3)IVseq and ASPaac(3)IVseq were used to sequence outwards from the inserted selectable marker.
To remove the selectable marker, gene-disrupted pCAPtsa plasmids were transformed into E. coli DH5α-BT340 9 by electroporation.Transformants were selected for on LB-agar containing apramycin, chloramphenicol, and kanamycin at 30 °C.Individual colonies were picked and spread to single colonies on LB-agar without antibiotics and grown overnight at 42 °C.Isolated colonies were spread as patches on LB-agar containing kanamycin and then LB-agar containing kanamycin and apramycin.Patches that only grew on the LB-agar plates containing kanamycin were assumed to have lost the disruption cassette, leaving behind an 81 bp scar.This was additionally confirmed by sequencing using the following primers:

In trans expression of single genes
Constructs for the in trans expression of thioalbamide genes and thiostreptamide S4 complementation genes were assembled using standard digestion and ligation of PCR fragments into pIJ10257 10 .taaRed and taaCYP were amplified from Amycolatopsis alba DSM 44262 genomic DNA using the primer pairs: SPNdeITaaRed and ASPNdeITaaRed, and SPNdeITaaCYP and ASPPacITaaCYP, respectively.Thiostreptamide S4 genes tsaA, tsaC-tsaJ were amplified from pCAPtsa using the following primers to assess for the correct start codon (see Table S3 and Figure S4): for tsaA SPNdeITsaA, SPNdeITsaAv2, SPNdeITsaAv3 and ASPPacITsaA were used; for tsaC SPNdeITsaC, SPNdeITSaCv2, and ASPPacITsaC were used; for tsaD SPNdeITsaD, SPNdeITsaDv2, SPNdeITsaDv3, and ASPPacITsaD were used; for tsaE SPNdeITsaE and ASPPacITsaE were used; for tsaF SPNdeITsaF and ASPPacITsaF were used; for tsaG SPNdeITsaG, SPNdeITsaGv2, SPNdeITsaGv3, SPNdeITsaGv4, and ASPPacITsaG were used; for tsaH SPNdeITsaH and ASPPacITsaH were used; for tsaI SPNdeITsaI and ASPPacITsaI were used; for tsaJ SPNdeITsaJ and ASPPacITsaJ were used; and for tsaMT SPNdeITsaMT and ASPPacITsaMT were used.
PCR fragments and pIJ10257 were digested with NdeI and PacI.The PCR fragments were ligated in to pIJ10257 using T4 ligase (Invitrogen).Ligation reactions were transformed into E. coli DH5α by electroporation and transformants were selected for on DN-agar containing hygromycin.Colonies were screened for correct ligations by PCR using primers SPpIJ10257ins and ASPpIJ10257ins.Plasmids extracted from these colonies were additionally checked by sequencing using the same primers.

Precursor peptide mutations
Precursor peptide mutations were made to tsaA within pCAPtsa.These assemblies were all carried out using LiAc/PEG mediated transformation into yeast.Assemblies were designed in such a way as each core peptide modification was installed by a single oligonucleotide, which was linked to the backbone by a PCR fragment on each side.The backbone was AflII and SrfI-digested and gelpurified pCAPtsa.The parts used in each assembly are shown in Table S6, while Table S7 has a description of each part.A schematic of the assembly is shown in Figure S19.
The efficiency and flexibility of the system meant no strict adherence to concentration or molar ratios of DNA was necessary.As a guide, the following was sufficient for an assembly: 40 ng of linearized plasmid, 80 ng of each PCR product, and 500 pmol of each oligonucleotide.PCR products and linearised plasmids were purified as described in Standard protocols above.PCR products used in assemblies were produced with approximately 60 bp regions of overlap with other assembly parts.The oligonucleotides used in assemblies overlapped by between 30 and 60 bp, and if they were assembled to another oligonucleotide they were also complementary.Plasmid screening was accomplished using PCR and/or sequencing.The assembly of pCAPtsa tsaA mutant plasmids were assessed by PCR using the primers SPPTsaAseq and ASPPTsaCseq, and subsequently sequenced using the primers SPPTsaAseq, ASPPTsaCseq, and ASPSrfITsaCseq.

Liquid chromatography -mass spectrometry (LC-MS)
Plugs were taken from production plates using the top of a 1 mL pipette and mixed with 500 μL methanol per plug.This was shaken for 1 hour at room temperature before being centrifuged at 20 000 x g for 15 minutes.The resulting supernatant was used for analysis.High throughput LC-MS/MS data were acquired using a Shimadzu Nexera X2 UHPLC connected to a Shimadzu iontrap time-of-flight (IT-TOF) mass spectrometer and analysed using LabSolutions software (Shimadzu).10 μL samples were injected onto a Phenomenex Kinetex 2.6 μm XB-C18 column (50 mm x 2.1 mm, 100 Å).The samples were eluted over 5 minutes using a 5 to 95% gradient of acetonitrile in water + 0.1% formic acid.After the first minute of each run, positive mode MS data were collected between m/z 200 and 2000, with an ion accumulation window of 20 ms and automatic sensitivity control of 70% of the base peak.The curved desolvation line (CDL) temperature was 250 °C and the heat block temperature was 300 °C.MS/MS data were collected between m/z 50 and 2000 in a data-dependent manner for parent ions between m/z 200 and 2000, using collision-induced dissociation energy of 50% and a precursor ion width of 3 Da.The instrument was calibrated using sodium trifluoroacetate cluster ions prior to every run.
LC-MS data acquired on the Shimadzu IT-TOF was additionally assessed using the statistical package Profiling Solution 1.1 to provide untargeted analysis.The following Profiling Solution parameters were used: ion m/z tolerance = 0.1 Da, ion intensity threshold = 50,000, LabSolutions compatible ion m/z tolerances = ON, De-Isotope matrix = ON.Metabolites detected in the negative controls were filtered out during post-processing in Microsoft Office 365 ProPlus Excel.For each quantification experiment, the relevant metabolite was measured by integration of LC-MS peak areas using Browser software (Shimadzu).MS quantification data are the average of triplicate cultures.Error bars on all graphs represent the standard error.
High resolution LC-MS/MS data were acquired by Gerhard Saalbach (John Innes Centre) using a Waters Synapt G2-Si mass spectrometer, and analysed using MassLynx software (Waters).The Synapt G2-Si was operated in positive mode with a scan time of 0.5 s in the mass range of m/z 50 to 1200.7 μL samples were injected onto a Phenomenex Luna Omega 1.6 μm Polar C18 column (50 mm x 2.1 mm, 100 Å) and eluted with a linear gradient of 1 to 50 % acetonitrile in water + 0.1% formic acid over 20 minutes.Synapt G2-Si MS data were collected with the following parameters: capillary voltage = 3 kV; cone voltage = 40 V; source temperature = 120 °C; desolvation temperature = 350 °C.Leu-enkephalin peptide was used to generate a dual lock-mass calibration with m/z = 278.1135and m/z = 556.2766measured every 30 s during the run.
Purification and NMR analysis of 12 12 was purified from S. coelicolor M1146-pCAPtsaΔtsaE, where LC-MS was used to assess the progress of purification through much of the purification.100 μL of spores were spread on 1.6 L of BPM-agar and were incubated at 28 °C for five days.This was extracted with 3.2 L of ethyl acetate, which resulted in an organic fraction containing a small proportion of 12 and the remaining solid material containing the rest of 12.The solid material was subsequently extracted with 3.2 L of methanol and reduced on the rotary evaporator at 30 °C to 800 mL.A liquid-liquid extract with 3.2 L of ethyl acetate was performed and the remainder of 12 partitioned into the organic phase.This organic phase was combined with the original ethyl acetate extract and was evaporated on a rotary evaporator at 30 °C to produce a brown solid.This was resuspended in CHCl3 and dried onto silica gel.This was packed into a silica column and separated into 12 fractions using CHCl3 and an increasing concentration of CH3OH.An initial 400 mL elution of 100% CHCl3 was followed by 10 elutions of 200 mL, ranging from 5% to 25% CH3OH in CHCl3.The final elution was with 100% CH3OH.Fractions eluted with 10% CH3OH and upwards contained 12, so were combined and evaporated to a brown solid using a rotary evaporator at 30 °C.This was resuspended in CH3OH, dried onto a 1 g DSC-18 column (Discovery), and eluted with a step-wise gradient of water:acetonitrile.14 elutions of 10 mL, ranging from 0% to 60% acetonitrile, were collected.All fractions with acetonitrile concentrations equal to or less than 20% contained 12, and so were combined and evaporated using a rotary evaporator at 30 °C to give 493 mg of brown solid.This was resuspended in CH3OH and separated by HPLC on a Gemini-NX 2.6 μ C18 column (150 mm x 21.2 mm, 100 Å) using a gradient from 10% to 60% acetonitrile over 45 minutes.Compound collection was guided by absorbance at 272 nm (characteristic of thioamide bonds).12 was collected and evaporated using a Genevac EZ-2 Elite.This provided 0.7 mg of pure 12 as a white powder.
NMR spectra were recorded on a Bruker Avance 600 MHz NMR spectrometer.Chemical shifts were reported in ppm using the signals of the residual solvents as internal references (H 3.31 and C 49.0 for CD3OD).Data were analysed using Mnova 6.0.2 (Mestrelab).See Figures S11 -S16 for NMR spectra.
The resulting sequences were size-filtered to remove any proteins shorter than 150 amino acids to yield a dataset with 1340 sequences.This dataset was then submitted to the online CD-HIT suite 12 for identity-based filtering with a 95% identity cut-off to reduce redundancy in downstream analyses.This yielded a reduced dataset with 828 sequences whose identity to each other was lower than 95%.The accession numbers of these sequences were then used as input for RiPPER analysis 13 , which was run in Docker using default RiPPER parameters (minPPlen = 20, maxPPlen = 120, flanklen = 17.5, sameStrandReward = 5, maxDistFromTE = 8, fastaOutputLimit = 3, prodigalScoreThresh = 7.5).RiPPER retrieved information for 743 of the 828 entries, as the remaining 86 did not have associated nucleotide information.The RiPPER output was then used for downstream analyses, including short peptide networking with EGN 14 (thresholds = 40% identity, 40% sequence coverage), which was visualised using Cytoscape 15 (version 3.8.2).The RiPPER output is provided as Supplementary Datasets 2 (retrieved peptides and associated information) and 3 (Cytoscape file of networked peptides).Peptides belonging to each of the main networks described in the text were further analysed performing multiple sequence alignment with MUSCLE 16 .The resulting alignment files were then visualised for residue conservation across within each peptide network using WebLogo 17 (version 3.7.4)with the following parameters: logo size = medium, units = probability, scale stacks width = on, no adjustment for composition, color scheme = chemistry.

Phylogenetic analyses of HopA1 domain-containing proteins
The set of successful RiPPER entries was used to carry out a phylogenetic analysis of the HopA1 domain-containing proteins.As an outgroup for this analysis, the HopA1 protein from Pseudomonas syringae pv.syringae (AAF71481.2) was added to the dataset.Multiple sequence alignment and tree construction for this dataset were carried out using MUSCLE 16 and RAxML 18 respectively, through the CIPRES science gateway 19 .The following MUSCLE parameters were used: muscle -in infile.fasta-seqtype protein -maxiters 2 -maxmb 30000000 -hydro 5 -hydrofactor 1.2 -log logfile.txt-verbose -weight1 clustalw -distance1 kmer6_6 -cluster1 upgmb -sueff 0.1 -root1 pseudo -maxtrees 1 -weight2 clustalw -distance2 pctidkimura -cluster2 upgmb -sueff 0.1 -root2 pseudoobjscore sp -noanchors -fastaout output.fastaThe FASTA output from the MUSCLE alignment was then used as input for maximum likelihood tree construction with RAxML using the following parameters: raxmlHPC-HYBRID -T 4 -N autoMRE -n HopA1_filtered_RiPR -s infile.txt-p 12345 -m PROTGAMMABLOSUM62 -k -f a -x 12345 -o AAF71481.2HopA1 [Pseudomonas syringae pv.syringae] --asc-corr lewis The resulting maximum likelihood tree was visualised and edited in iTOL 20 to map the most prevalent precursor peptide networks identified by RiPPER and protein conserved domains associated to the HopA1 proteins in the tree.Upon tree plotting, four outlier proteins were identified and removed from the final analysis (KXJ59479.1,WP_12384669.1,WP_141234752.1 and PNO53249.1).

Genetic context analysis and conserved domain mapping
The genetic context of putative BGCs was assessed by using the output files from the RiPPER analysis described above.The GenBank files containing the annotated region surrounding the genes of interest were used to generate local databases for MultiGeneBlast analysis 21 .Separate databases were created for each of the most prevalent short peptide networks.These databases were then queried (default MultiGeneBlast settings) with a representative example GenBank file from each network to assess for genetic conservation and therefore determine the composition of putative biosynthetic gene clusters associated to each peptide network.
In addition, the "main_co_occur.csv"files (containing co-occurring conserved pfam domains encoded in the genetic region of the query protein) generated by RODEO2 22 (incorporated into RiPPER) for each of the protein accessions were merged into a single file for conserved domain analysis.This file was then used to determine the most abundant pfam domains co-occurring with the HopA1 domain-containing proteins across the full dataset using the instructions described at http://ripp.rodeo/advanced.html.The 35 most abundant pfam domains, plus other selected pfam domains (see Supplementary Dataset 1 for full list) were then mapped to each of the HopA1 proteins to determine co-occurrence patterns.For those entries where a phosphotransferase pfam domain was not identified by this mapping process, the proteins encoded immediately upstream of the HopA1 gene were retrieved and individually inspected to detect putative conserved pfam domains which might have been above the previous E-value cut off threshold (1x 10 -3 ) as well as conserved domains from non-pfam databases and annotated accordingly.For ease of visualisation, results for pfam domains corresponding to similar activities (e.g.transport-related domains) were merged prior to mapping to the HopA1 phylogenetic tree using the iTOL annotation editor.The annotated tree is available at https://itol.embl.de/shared/1Idz6QnEJESFiand is shown in Figure S26.S13 and S15).c. Due to low signal these carbons are below the level of noise in the DEPTQ spectrum (Figure S12), however their HMBC correlations are visible (Figure S15).

Figure S2 A
Figure S2A.The genes captured on pCAPtsa.The genes in the predicted BGC are colour-coded as in the main paper.B. LC-MS analysis of each gene deletion cluster and their successful complementations in S. coelicolor M1146.An extracted ion chromatogram (EIC) of thiostreptamide S4 (1; m/z 1377.55) is shown for each mutant.The wild type (WT) EIC is duplicated as a reference chromatogram for each set of peaks.An asterisk highlights the ΔtsaC + pIJtsaC and ΔtsaD + pIJtsaD peaks, whose retention times are shifted due to the addition of a guard to the column.

FFigure S3
Figure S3 Mutants that produced thiostreptamide S4 (1; m/z 1377.55),but at lower levels than the wild type BGC.MS peak areas are shown for S. coelicolor M1146 containing either the wild type tsa BGC, the ΔtsaK BGC, or the Δtsa-1, -2, and -3 BGC.Values represent the average of three biological replicas and the error bars represent the standard error.

Figure S4
Figure S4Identification of functional start codons in tsaD and tsaG (see TableS3). A. Region of the tsa BGC containing three potential start codons (red) for tsaD that were tested by complementation of ΔtsaD.The first start codon (bold), coupled to the tsaC stop codon, is the only one that successfully complemented the tsaD deletion.The third highlighted start codon is the originally annotated start codon.B. Region of the tsa BGC containing four potential start codons (red) for tsaG that were tested by complementation of ΔtsaG.The first start codon (bold), coupled to the tsaF stop codon, and the second start codon (bold) are the only start codons that successfully complemented the tsaG deletion.

Figure S5
Figure S5Accurate mass and MS/MS fragmentation data for compounds 1 -5.Annotated fragments are coloured red in the MS/MS spectra.Fragments marked with an asterisk show a loss of 33.99, characteristic of the loss of SH2 from thioamide bonds in MS/MS fragmentation (see FigureS6).Exact mass measurements are shown within the mass spectra.

Figure S6
Figure S6Selected MS/MS fragmentation of thiostreptamide S4(1) showing losses of 33.99 Da, which is characteristic of thioamide bond fragmentation.The mass spectrum is zoomed for clarity and a proposed reaction scheme is shown.

Figure S7 MS/MS fragmentation data for compounds 6 -
Figure S7 MS/MS fragmentation data for compounds 6 -10 alongside predicted molecule structures.Exact m/z measurements for doubly charged molecules are shown within the mass spectra; note that compounds 6 -9 are protonated once to become doubly charged.Characteristic macrocycle fragments are highlighted, as are other common fragments (losses of NH3 and SH2 are shown for 6).Calculated fragment masses shown with the structures.

Figure S8
Figure S8 MS/MS fragmentation data for compound 11 alongside predicted molecule structure.An exact m/z measurement is shown within the mass spectrum.Fragments are highlighted, where calculated fragment masses are shown with the structure.

Figure
Figure S9 A. MS/MS spectra showing fragmentation of compounds 12 -15 alongside predicted structures as sodium adducts.The m/z of the parent molecule is shown in the top right of each chromatogram.Fragments corresponding to unusual sodiated peptide fragments (panel B) are highlighted by symbols, and fragments marked with an asterisk show a loss of 33.99, characteristic of the loss of SH2 from thioamide bonds.B. The unusual position of bond breakage common in the fragmentation of the sodium adduct of peptides 23,24 .

Figure
Figure S10 A. LC-MS analysis of S. coelicolor M1146 expressing pCAPtsa (wild type, WT), pCAPtsa S1T and pCAPtsaM3I.The BPC and EICs of 16 and two related metabolites (m/z 503.14, 517.16 and 485.19, respectively) are shown.The EICs are magnified 2x compared to the BPCs for clarity.B. Predicted structures of 16 and the two related metabolites labelled with their m/z.Fragments of 16 observed in the MS/MS spectrum are shown on the molecule and highlighted in the spectrum in red.Fragments marked with an asterisk show a loss of 33.99 (loss of SH2) and symbols are used to highlight the unusual sodiated peptide fragmentation described in Figure S9.

Figure 12 Figure S16
Figure S15 2D HMBC NMR spectrum for molecule 12 (CD3OD, 298 K).Correlations are annotated.Visible correlations are shown as red arrows in 12, where arrows represent H to C direction of correlation; double headed arrow indicates that correlations in both directions were detected.

Figure S17
Figure S17 Confirmed structures of compounds 1 and 12, and structures of compounds 2-11 and 13-16 proposed by detailed MS/MS analysis.*Permanent charge on bis-methylated histidine means that a single protonation generates a doubly charged molecule.

Figure
Figure S18 A. LC-MS analysis of M1146-pCAPtsa and M1146-pCAPtsaS1T.EICs of m/z 1377.55 and m/z 1391.56 are shown.B. Predicted structure of 17 annotated with fragments observed via MS/MS analysis.Fragments marked with an asterisk show an additional loss of 33.99, characteristic of the loss of SH2 from thioamide bonds.C. MS/MS spectrum of m/z 1391.56 from M1146-pCAPtsaS1T.

Figure
Figure S19 A. Schematic of yeast assembly-based modification of the tsaA core peptide gene in pCAPtsa.B. Mutants generated in this study.A tick indicates that a fully modified thiostreptamidelike compound was detected, while a cross indicates that no fully modified compound was detected.

Figure
Figure S20 A. LC-MS analysis of M1146-pCAPtsa and M1146-pCAPtsaT8S.EICs of m/z 1377.55 and m/z 1363.53 are shown.3* labels the second isotope peak of compound 3 (see Figure S5).Mass spectrums are shown to enable the distinguishing between 3 and 18, as their retention times are the same.B. Predicted structure of 18 annotated with fragments observed via MS/MS analysis.Fragments marked with an asterisk show an additional loss of 33.99, characteristic of the loss of SH2 from thioamide bonds.C. MS/MS spectrum of m/z 1363.53 from M1146-pCAPtsaT8S.
Figure S21 MS peak areas of 12 (m/z 453.16) in S. coelicolor M1146 expressing H12W, H12A, Y11V and wild type (WT) clusters.The bar chart is normalised to the highest mass spectral area (H12W).Values represent the average of three biological replicas and the error bars represent the standard error.

Figure
Figure S22 A. Proposed structure of 19, the methionine sulphoxide version of 1.The oxidation is highlighted in red and the fragments seen during fragmentation are marked on the molecule.Fragments marked with an asterisk show a loss of 33.99 (loss of SH2 from thioamide bonds).B. MS/MS spectrum of 19.Panels C, D and E show EICs of m/z 1377.55 (black) and m/z 1393.54 (red).Molecules 1 and 19 are labelled, where the peaks labelled with an * are predicted to be a methionine sulphoxide version of 3 (see Figure S5).Each chromatogram is normalised to the intensity of 1, as a quantitative comparison between these samples was not possible.C. EICs following step one of 1 purification (methanol extraction from M1146-pCAPtsa).D. EICs following step two of 1 purification (liquid-liquid extraction using EtOAc).E. EICs following step three of 1 purification (Sephadex chromatography).

Figure
Figure S23 A. LC-MS analysis of M1146-pCAPtsa and M1146-pCAPtsaM3I.EICs of m/z 1377.55 and m/z 1359.59 are shown.B. Predicted structure of 20 annotated with fragments observed via MS/MS analysis.Fragments marked with an asterisk show an additional loss of 33.99, characteristic of the loss of SH2 from thioamide bonds.C. MS/MS spectrum of m/z 1359.59 from M1146-pCAPtsaM3I.

Figure S24
Figure S24Comparison between thiostreptamide S4 and thioalbamide25 .A. BGCs with genes unique to thioalbamide highlighted in bold.B. Structures of thiostreptamide S4 and thioalbamide with structural differences due to amino acid changes are highlighted in blue, while differences due to tailoring enzymes are highlighted in red.

Figure S25
Figure S25 Activity of TaaRed from thioalbamide pathway expressed alongside pCAPtsa.A. LC-MS analysis of M1146-pCAPtsa and M1146-pCAPtsa + taaRed.EICs of m/z 1377.55 and m/z 1379.56 are shown.1* labels the second isotope peak of 1, indicating that TaaRed does not fully reduce 1 in these production conditions.B. Predicted structure of 21 annotated with fragments observed via MS/MS analysis.Fragments marked with an asterisk show an additional loss of 33.99, characteristic of the loss of SH2 from thioamide bonds.C. MS/MS spectrum of m/z 1379.56 from M1146-pCAPtsa + taaRed.

Figure S26
Figure S26Maximum likelihood phylogenetic tree of HopA1 domain containing proteins analysed in this work.The tree branches are colour-coded according to the taxonomic origin of the protein: Cyanobacteria (turquoise), Actinobacteria (black), Proteobacteria (pink), Bacteroidetes (blue) and others (orange).Selected RiPPER-generated peptide networks that are associated with HopA1 proteins are shown as coloured strips surrounding the tree.Conserved pfam domains encoded near each HopA1 protein are shown as coloured dots in the outer rings of the tree.Tree visualised using iTOL20 .See https://itol.embl.de/shared/1Idz6QnEJESFifor an interactive version of the tree and Supplementary Dataset 1 for co-occurring protein domains.

Figure S27 Figure S28
Figure S27Short peptide networks associated with HopA1 domain proteins detected by RiPPER.Each of the networks is composed of short peptides sharing a minimum 40% identity.Networks that are shown in Figure7of the main paper are labelled.Network image generated in Cytoscape15 .Nodes are arbitrarily colour-coded by network.

Figure S29
Figure S29 Analysis of peptide networks 11 and 20 and their genetic context.A. Partial view of the HopA1 phylogenetic tree showing its association to peptide networks 11 and 20.The tree branches are colour coded as in Figure S26.B. Genetic organisation of a representative gene cluster containing peptides from each network.The genes present in the clusters are colour coded to match the pfam annotation on the phylogenetic tree.C. Sequence logo representation of an alignment of precursor peptides belonging to networks 11 and 20.See Figure S28 for details of logo visualisation.

Figure S30
Figure S30 Analysis of peptide network 22 and its genetic context.A. Partial view of the HopA1 phylogenetic tree showing its association to peptide network 22. B. Genetic organisation of a representative gene cluster containing a peptide from network 22.The genes present in the cluster are colour coded to match the pfam annotation on the phylogenetic tree.C. Sequence logo representation of an alignment of precursor peptides belonging to network 22. See Figure S28 for details of logo visualisation.

Figure S31
Figure S31 Analysis of peptide network 2 and its genetic context.A. Partial view of the HopA1 phylogenetic tree showing its association to peptide network 2. The tree branches are colour coded as in Figure S26.All branches shown here are cyanobacterial proteins.Conserved pfam domains in the genetic context of each HopA1 proteins are shown as coloured dots in the outer rings of the tree, and network 2 is shown.B. Genetic organisation of a representative gene cluster containing a peptide from network 2. The genes present in the clusters are colour coded to match the pfam annotation on the phylogenetic tree.C. Sequence logo representation of an alignment of precursor peptides belonging to network 2. See Figure S28 for details of logo visualisation.

Figure S32
Figure S32 Analysis of peptide network 9 and its genetic context.A. Partial view of the HopA1 phylogenetic tree showing its association to peptide network 9.The tree branches are colour coded as in Figure S26.Conserved pfam domains in the genetic context of each HopA1 proteins are shown as coloured dots in the outer rings of the tree.B. Genetic organisation of a representative gene cluster containing a peptide from network 9.The genes present in the cluster are colour coded to match the pfam annotation on the phylogenetic tree.PLD = phospholipase D nuclease.C. Sequence logo representation of an alignment of precursor peptides belonging to network 9. See Figure S28 for details of logo visualisation.
ΔtsaA was assessed with SPTsaA and ASPTsaA; ΔtsaC was assessed with SPTsaC and ASPTsaC; ΔtsaD was assessed with SPTsaD and ASPTsaD; ΔtsaE was assessed with SPTsaE and ASPTsaE; ΔtsaF was assessed with SPTsaF and ASPTsaF; ΔtsaG was assessed with SPTsaG and ASPTsaG; ΔtsaH was assessed with SPTsaH and ASPTsaH; ΔtsaI was assessed with SPTsaI and ASPTsaI; ΔtsaJ was assessed with SPTsaJ and ASPTsaJ; ΔtsaK was assessed with SPTsaK and ASPTsaK; ΔtsaMT was assessed with SPTsaMT and ASPTsaMT; ΔtsaL was assessed with SPTsaL and ASPTsaL; ΔtsaMO was assessed with

Table S1
Representative BLASTP and pfam matches for each protein encoded in the thiostreptamide S4 BGC.Results shown correspond to the first non-identical hit to TsaA and to the top matching hits to the tailoring enzymes not found in putative thioamitide gene clusters.

Table S2
Primers used in this study.

Table S3
Details of complementation experiments.

Table S4
Untargeted metabolomics with P-values lower than 1E -4 for the wild type cluster and clusters with deletions to tsaC, tsaD, tsaE, and tsaF.The value and intensity of shading in the cells reflects the MS intensity.Ion m\z cells are shaded green if their structure was proposed in this study.Values represent the average of three biological replicates.TableS5DEPTQ and 1 H NMR assignments for 12 in CD3OD.
b.The signal for this hydrogen is masked by a contaminant compound; COSY and HMBC correlations are visible (Figures

Table S6
Yeast assemblies of pCAPtsa variants and their constituent parts.

Table S7
Parts used in cluster assemblies.