Structural and biochemical analysis of a novel atypically split intein reveals a conserved histidine specific to cysteine-less inteins

Protein trans-splicing mediated by a split intein reconstitutes a protein backbone from two parts. This virtually traceless autoprocessive reaction provides the basis for numerous protein engineering applications. Protein splicing typically proceeds through two thioester or oxyester intermediates involving the side chains of cysteine or serine/threonine residues. A cysteine-less split intein has recently attracted particular interest as it can splice under oxidizing conditions and is orthogonal to disulfide or thiol bioconjugation chemistries. Here, we report the split PolB16 OarG intein, a second such cysteine-independent intein. As a unique trait, it is atypically split with a short intein-N precursor fragment of only 15 amino acids, the shortest characterized to date, which was chemically synthesized to enable protein semi-synthesis. By rational engineering we obtained a high-yielding, improved split intein mutant. Structural and mutational analysis revealed the dispensability of the usually crucial conserved motif N3 (block B) histidine as an obvious peculiar property. Unexpectedly, we identified a previously unnoticed histidine in hydrogen-bond forming distance to the catalytic serine 1 as critical for splicing. This histidine has been overlooked so far in multiple sequence alignments and is highly conserved only in cysteine-independent inteins as a part of a newly discovered motif NX. The motif NX histidine is thus likely of general importance to the specialized environment in the active site required in this intein subgroup. Together, our study advances the toolbox as well as the structural and mechanistic understanding of cysteine-less inteins.


General
Solvents and standard chemical reagents were purchased from Sigma Aldrich, Acros Organics, TCI, Alfa Aesar, Carbolution, Fluorochem, Iris or Merck and were used without further purification. Restriction enzymes were purchased from Thermo Scientific. Synthetic DNA strings were ordered from Thermo Fisher. Synthetic oligonucleotides were ordered from Biolegio. Plasmids were verified by DNA sequencing by Seqlab.

Computational sequence analyses
PolB16 was identified by searching for intein motifs as previously described 1 in a dataset of sheep gut metagenomes (GenBank accession AUXO010000000). GenBank accession and coordinates for the Int N and Int C intein regions are AUXO013913591.1:1843-1887 and AUXO013913591.1:4443-4901+AUXO012578971.1:1-83, respectively. 2 The NX motif was generated using the glam2 program 3 on intein sequences from the InBase database 4 that did not have cysteines in both 1 and +1 positions, and were not class-3 inteins. 1 Corresponding regions to the NX motif in Cys1 inteins were identified by superposition of the PolB16 structure NX region on representative known structures of Cys1 class-1 inteins (excluding inteins with redundant or engineered sequences). PolB16 Cα atoms positions of residues 59-69, for the NX motif, and 101-106, for the N3 motif, and corresponding positions of other structures (e.g., 7OEC 61-71 and 85-96, where the later segment is extended by 6 residues in all structures) were used in the superposition. Once superposed, the position of each residue in the intervening segment was compared to all other structures and a residue was considered aligned if its Cα atom was within 1.5Å of the Cα of a residue in another protein and their Cα-Cβ vectors pointed in the same direction. Sequence logos of protein multiple sequence alignments were created as previously described. 1

Protein production and purification
All proteins were produced in E. coli BL21(DE3) Gold cells. Cells were cultured at 37 °C in LB-medium with the corresponding antibiotic until an OD600 of 0.6 -0.8 was reached. Protein expression was induced at 20 °C for 20 h by either adding IPTG (0.4 mM, pET-based vector systems) or L-Arabinose (0.2 % w/v, pBAD-based vector systems). Cell pellets were collected by centrifugation, resuspended in the respective purification buffer, flash frozen and stored at -20 °C till further use. Resuspended cells were ruptured using an Emulsiflex C5 emulsifier (Avestin). Insoluble fractions were removed by centrifugation and the supernatant fractions were used to purify the proteins.
For purification via Ni-NTA affinity chromatography of His-tagged proteins, cell pellets were resuspended in Ni-NTA buffer (50 mM Tris/HCl, 300 mM NaCl, pH 8.0). Purification was performed at 4 °C using flow gravity flow columns with a bed volume of 1 mL of Ni-NTA resin (Cube Biotech). For washing, two steps with Ni-NTA buffer with Ni-NTA buffer + 40 mM imidazole were performed. Proteins were eluted in a single fraction (2 mL) with Ni-NTA buffer + 250 mM imidazole.
For purification via size-exclusion chromatography (SEC) the protein solution was injected onto a HiLoad 16/600 Superdex 200 prep grade column at 4 °C using an ÄKTA Purifier (GE Healthcare). The proteins were eluted at a flow rate of 1 mL/min. Fractions were collected and S3 upconcentrated. Purified proteins were dialyzed three times against a PBS buffer and finally dialyzed against PBS buffer + 10 % glycerol before flash freezing in liquid nitrogen and storage at -80 °C. Protein concentrations were determined using the calculated extinction coefficient at 280 nm.
Constructs for crystallization (30 and 31) were purified via chitin-binding domain (CBD) pulldown using the IMPACT kit (Intein mediated Purification with an Affinity Chitin-binding tag; New England Biolabs). The supernatant of the centrifuged cell lysate was transferred to a gravity flow column with chitin-agarose. This step was followed by a wash step with 10 CV CBD buffer (Tris/HCl 20 mM, NaCl 500 mM, EDTA 1 mM, pH 8.0). The N-terminal cleavage of the fused Ssp GyrB N intein 5 and the subsequent release of the protein of interest was induced by the addition of 5 CV cleavage buffer (CBD-Buffer + 100 mM DTT). The column with the cleavage buffer was left at 4 °C shaking for 48 hours. Afterwards the eluate was collected, and the column was again eluted with 5 CV cleavage buffer. The two elution fractions were united and concentrated for further use.
For purification of Psp GBD-Pol intein precursor constructs affinity chromatography on an amylose resin (NEB) was performed. 2 g/L glucose was added to the LB medium (300 mL) before and after induction of protein expression to prevent the expression of amylase. Protein expression was induced at 20 °C for 20 h by adding IPTG 0.4 mM. Cell pellets were resuspended in ACB buffer (20 mM Tris, 200 mM NaCl, 1 mM EDTA, 1 mM DTT, pH 7.4) and lysed using an Emulsiflex C5 emulsifier (Avestin). Purification was performed at 4 °C using gravity flow columns with a bed volume of 1.5 mL resin. The column was washed with 10 column volumes of ACB buffer. The protein was eluted in three fractions containing 1 mL of column buffer + 10 mM maltose.
Recombinant precursor protein expression of the Mvu-M7-Pol-3 intein and its histidine mutants was induced at 28 °C for 10 h by adding IPTG (0.4 mM). Proteins were then purified by Ni-NTA affinity chromatography as described above.

Splice assays
Protein trans-splicing assays were performed in PBS using the described concentrations and at the mentioned temperatures. For determination of splicing rates, one of the split intein precursors was used at either a three-or four-fold excess in order to carry out the splicing reaction under pseudofirst order conditions. The splicing reaction was initiated by mixing N-and C-terminal intein precursors. The reaction was stopped at the described time points by taking an aliquot of the reaction mixture and boiling (5 min, 98 °C) the aliquots in 4x SDS sample buffer (500 mM Tris/HCl, 8 % (w/v) SDS, 40 % (v/v) glycerol, 20 % (v/v) 2-mercaptoethanol, 5 mg mL -1 bromophenol blue, pH 6.8). Cis-splicing assays were conducted in vivo in E. coli cells. The cis-constructs were expressed at 20 °C for 20 h and the splice product was purified according to the purification methods in the preceding paragraph.

Densitometric analysis and determination of rate constants
Coomassie-stained SDS gels were scanned and the signal intensity of Coomassie-stained bands was determined using ImageJ. The signal intensity was normalized to the molecular weight of the protein.
The normalized intensities of the splice product (SP), C-Cleavage (CC) and precursor protein (Int C ) were calculated and inserted in the following equations to determine the desired values, including the absolute turnover:

S4
The splice yield was plotted against the time and fitted to the following pseudo-first-order equation using GraphPad Prism (version 9.5): with Pt = yield of product at time t, P0 = maximum yield of product, t = time and k = pseudo-firstorder reaction constant.

Solid phase peptide synthesis of Fluorescein-Int N (CF-Int N )
The peptide was assembled on a TGR resin with a freshly coupled rink amide linker, by stepwise microwave assisted Fmoc-SPPS on a Liberty blue peptide synthesizer, operating on a 0.1 mmol scale. Activation of entering Fmoc-protected amino acids (Carbolution, Merck Millipore or Iris Biotech) was performed using Oxyma and DIC in DMF (1:1 molar ratio), with a 4 equivalent excess over the initial resin loading. Coupling steps were performed for initial 15 seconds at 75°C and 150 watts followed by 110 seconds at 90 °C and 30 watts. Fmoc-deprotection steps were performed by treatment of the resin with a 20% piperidine solution in DMF for initial 15 seconds at 75°C and 150 watts followed by 50 seconds at 90 °C and 30 watts. Following each deprotection step, the resin was washed thoroughly with DMF. 5(6)-Carboxyfluorescein (CF) was manually coupled to the peptide by adding a solution of 5(6)-carboxyfluorescein-OH (2 eq.) (Sigma Aldrich), DIC (2 eq.) and HOAt (2 eq.) in DMF to the resin and shaking at room temperature for 16 hours. The resin was subsequently washed with DMF and DCM, and dried under nitrogen flow. The labelled peptide was finally cleaved off the resin by treatment with an ice-cold TFA, TIS, water mixture (90:5:5) and allowed to shake at room temperature for 3 hours, followed by purification by RP-HPLC.

Structure determination
For structure determination, two fusion constructs of the Int N and Int C fragments were used, either with or without the non-conserved cysteines mutated to alanine, each with 10 extein residues, connected by a GSH (Gly-Ser-His) linker and with Ser1 and Asn183 at the splice junctions mutated to Ala. Sitting drop crystallization was performed at 20 °C. The wildtype PolB16 variant with the nonconserved cysteines was used at 140 μM protein concentration. Best crystals grew in 0.1 M phosphate/citrate buffer pH 4.2, 38% ethanol, and 5% PEG1000. Crystals were soaked consecutively in reservoir solution plus 0.1 M and 0.2 M NaI for 2 h each, then transferred to cryo conditions with 60% ethanol and flash-frozen in liquid nitrogen. Diffraction data was collected at Helmholtz-Zentrum Berlin BL 14.2 (Ref 6 ) and was processed with XDSAPP. 7 Initial phases were obtained by SAD (single wavelength anomalous diffraction, Phenix AutoSol) 8 and the model was generated by automated model building (Phenix AutoBuild), 9 followed by several rounds of manual building (coot) 10 and refinement (Phenix Refine). 11 The Cys-less version with the additional mutations C111A, C165A crystallized at 1.3 mM in the same conditions, but was transferred into mother liquor with 0.125 % (v/v) glutaraldehyde prior to vitrification in reservoir solution supplemented with 30% PEG 400. Diffraction data was collected at Helmholtz-Zentrum Berlin BL 14.1, processed with XDSAPP, and an S5 initial model obtained by MR (molecular replacement) with the wild-type structure (Phenix Phaser) 12 was finalized by several rounds of manual building (coot) and refinement (Phenix Refine).
Data collection and refinement statistics are summarized in Table SX, and the structure factors and models have been deposited to the PDB with accession numbers 8CPN (wild-type) and 8CPO (Cysless).    Figure S1 Protein trans-splicing activity of the split PolB16 intein with removed, non-conserved cysteine residues. A) Schematic reaction overview of the two precursor proteins MBP-Int N -His6 (1) and Int C -eGFP-His6 (2, 3, 8-10), which form the desired splice product (SP) and the byproducts Int C and Int N . C-Cleavage forms the side product C-Cleavage (CC) next to Int C . B) SDS-PAGE of the splice assays with the native Int N -precursor (1;10 µM) in excess towards the cysteine-free (3) or the wildtype (2) Int C -precursor (5 µM) at 25°C and pH 7. C) Yields of the total turnover as the sum of the SP (white) and CC (diagonally striped). Integrated into the same diagram is splicing rate (black; see y-axis on the right hand side) for the indicated split intein combinations (n=3; error bars represent standard deviations).      Similar MSA as in A) but using Cys1 inteins of known structure. The corresponding PolB16 sequence shown in italics is for reference and is not part of this alignment. These alignments were used to create the logo motifs representations shown in Figure 7. C) Structural superposition of the Cys1 structures listed in B using Cα atoms of the segments depicted by thick coils. PolB16 (red), which is not included in B, was overlayed on a representative Cys1 structure, 7OEC (blue). Motif NX of PolB16 and the corresponding Cys1 inteins regions are on the left towards the N' end, with the catalytic PolB16 His side chain shown. Motif N3 regions are on the right towards the C' end, with the catalytic His side chains shown. D) Structural overlay of four available Ser1 intein structures using corresponding Cα atoms. PolB16 is shown in red. Other inteins are Mja-TFIIB mini-intein (blue, PDB 5O9I), Neq Pol-n/Pol-c complex, (yellow, PDB 5OXX), and Tko Pol-2 (cyan, PDB 2CW7). The UCSF-Chimera package 16 was used for structures overlay and for preparing panels C and D.  Figure S6. Figure S10 In vivo co-expression of PolB16 split intein precursors in E. coli. MBP-Int N -H6 (32) and Int C -Trx (33) we co-expressed from a bicistronic arrangement on a single plasmid (pTP022) in E. coli BL21(DE3) cells. To this end, the sequence between the genes encoding for 32 and 33 comprised a stop codon (TAA) to terminate translation of the gene encoding 32, a ribosomal binding site (AGGAGG) and the start codon of the gene encoding 32 embedded in an NdeI restriction site as follows: 5'-TAAGCTTTAAGGAGGATCCCATATG-3'. Cells were grown in LB medium to an OD(600)=0.6 and an aliquot (-) removed for analysis. The culture was then induced with IPTG (0.4 mM) and after 4h at 37°C another aliquot (+) was removed for analysis. The removed cells were spun down, lysed in the denaturing conditions of SDS-PAGE buffer containing SDS and βmercaptoethanol (10 min; 95°C) to rule out any protein trans-splicing prior to cell lysis, and analyzed on an SDS-PAGE gel stained with Coomassie brilliant blue as show. Formation of the splice product (SP) confirms split intein precursor recognition and protein trans-splicing took place in the complex environment of the E. coli cell. Note that the Int N precursor (32) is much stronger expressed than the Int C precursor (33) due to the operon arrangement of the two genes with the Int C precursor being encoded by the second gene.