Synthesis of the extracellular domain of GLP-1R by chemical and biotechnological approaches

The extracellular domain of the glucagon-like peptide-1 receptor, GLP-1R, is responsible for the binding of GLP-1, and a handful of additional agonists (such as exenatide, lixisenatide, and liraglutide) used daily for treating type II diabetes mellitus. Lead discovery and optimization, however, require binding studies, which, in turn, necessitate the total synthesis of GLP-1R, comprising 108 residues. A protein domain of 10–15 kDa size could be obtained either by expression in E. coli or by ligating solid-phase peptide synthesis (SPPS)-made fragments. However, direct overexpression fails to give a properly folded protein, as GLP-1R forms an inclusion body, which fails to refold due to improper disulfide pairing. Several bacterial strains, constructs, and fusion partners were probed and it was found that only co-expression with MBP gave a 3D-fold allowing the native disulfide bond pattern formation. Some fusion partners can act as covalently linked or in situ chaperones for guiding the refolding of GLP-1R toward success. Therefore, the bottleneck to preparing GPCR extracellular domains is the correct pairing of the Cys residues. As a proof-of-concept model, nGLP1-R was made by SPPS to form the purified full-length polypeptide chain, subjected to self-guided or spontaneous Cys pairing. However, the formation of correct SS-pairs was lagging behind any protocol in use support, and the bottleneck of large-scale protein production relies on the risky step of proper refolding, which is sometimes possible only if a suitable fusion partner effectively helps and catalysis of the correct disulfide formation.


Introduction
Increased insulin utilization to treat type II diabetes mellitus (DM) can lead to dysfunction of the pancreatic b-cells and their subsequent destruction, which can ultimately lead to a decrease and cessation of insulin production and secretion. Due to the decreased insulin concentration, the body's glucose homeostasis is disrupted, which can cause hyperglycemia and other serious complications (vasoconstriction, infarction, blindness, etc.). Current therapies use mainly externally administered insulin or sulfonylurea derivatives. The main disadvantage of these two approaches is that they continue working even aer the restoration of optimum glucose levels, and thus their improper administration can lead to hypoglycemia. 1 On the contrary, the advantage of GLP-1 receptor agonists, including exendin-4, is that their insulin production is stimulated only in the presence of elevated blood glucose levels and therefore, there is no need to fear hypoglycemia due to overdose. 2 A comparative study of bacterial expression and/or the solid-phase peptide synthesis (SPPS) of shorter polypeptides and mini proteins 20-40 amino acids long related to GLP-1 was conducted. 3 We highlight clear differences, such as the non-selective 15 N-, 13 C-isotope labeling that is more economical to do by expression and why SPPS can be faster and easier to be automated, especially using ow chemistry. 4 The structural characterization of Trpcage mini proteins revealed no difference in the strategy taken to make GLP-1 agonist-like polypeptides 5 and the rational design of a-helix-stabilized exendin-4 analogues was successful. 6 The GLP-1 receptor is a B-family G protein-coupled receptor with an extracellular domain of 100-130 amino acids that binds up to 27-residues-long endocrine peptide hormones. There are currently 31 X-ray and 4 NMR extracellular domain 3D structures (PDB) all of which are expressed in E. coli. 7,8 The sequence identity of the extracellular domain of family B members of GPCRs is low. The important structural feature of the extracellular domain is the "complement control protein"-fold (CCP), an a-b-b/a architecture. Its central core consists of two antiparallel b-folds stabilized by 3 disulde bridges and hydrophobic interactions. The N-terminus of the domain is formed by a longer a-helix linked by an SS-bridge to the rst fold, thus forming a ligand-binding pocket. The sequence homology between the extracellular domains of family B GPCRs is surprisingly low, with essentially 6 Cys residues and just about a dozen amino acid identities. 9 Three different protocols are described to express GLP-1R in E. coli. (i) First the "direct" production of the extracellular domain, mostly with an N-terminal His-tag, by fermentation. In each case, the target protein was isolated from inclusion bodies, followed by a refolding process. [10][11][12][13] (ii) Second, the expression of GLP-1R with TrxA fusion protein in Origami cells, subsequently cleaved with thrombin. 10 However, in this case, the expressed nPTHR construct was accumulated in inclusion bodies, which aer refolding was degraded during thrombin cleavage. Presumably, the degradation occurred due to the high concentration of misfolded protein. 14-16 (iii) The third method was used to make nPTHR by fusing the construct to MBP and co-expressing with DsbC in Origami B cells. 17 Following the expression, the fusion protein was refolded in a GSH/GSSG redox system in the presence of DsbC. Note that the target protein was not separated from the MBP and occasionally, DsbC isomerase was not used during refolding. 17 The direct expression of the extracellular domain of the GLP-1R was published in 2002 by Bazarsuren et al. 10 and by Schröder-Tittmann. 11 These methods seem hardly feasible in the absence of a large-scale fermentor and refolding reactor. Nevertheless, we completed the expression in a conventional incubator shaker. However, our rst trials following the original protocol were unsuccessful. This was presumably due to the efficiency of the fermentation, 10 in which presumably $700 g of cell pellets and $10 g of inclusion bodies were isolated from liters of the medium. Presumably from the cell pellets, $10 g of inclusion bodies were isolated, in contrast: a conventional shake of 1 L of rich medium contains about 5 g of cell pellets, i.e. 70 mg of IB. During renaturation, a large amount of precipitation was observed due to the misfolding of the GLP-1R.
In summary, we failed to reproduce the soluble form of GLP-1R with native SS-bridge pairing when adapting the original protocol to conventional shaking culture. However, here we describe an MBP-fusion expression system we successfully used to purify the correctly folded GLP-1R from a conventional shaking bacterial culture.

Aims
In the absence of a large volume fermenter and refolding reactor, as is true of most labs, we aimed to enhance the yield of the native GLP-1R expressed in E. coli using ordinary incubator shakers. Our concept was to increase the refolding efficiency from lower amounts by using (i) domain optimization in the case of direct expression; (ii) alternative bacterial strains, and (iii) chaperone-like fusion proteins (Table 1). We show the comparative analysis of these approaches with the synthesized full-length, 108-residues-long GLP-1R domain.

Results and discussion
Protein expression, purication, and refolding The successfully crystallized and X-ray determined (PDB 3C5T) GLP-1R was made in E. coli, forming inclusion bodies (IBs). However, protein solubilization and refolding from IBs is a very inefficient, time-consuming, and costly process. 8,9 Therefore, to enhance the yield and efficiency of a direct and soluble expression, we modied the original protocol as well as the expressed GLP-1R DNA-construct as follows ( Fig. 1 and 2).  Concerning the original 1-147, (i) the N-terminal 1-23 signal sequence of the GLP-1R was cut off, resulting in GLP-1R (24-147) ( Table 1); (ii) furthermore, we further reduced and optimized the protein size at the C-terminus as well. The crystal structure of the GLP-1R contained 28-131 residues only, so the C-terminal 132-147 region must have been indeed exible at least in the crystal. Therefore, via its enhanced internal dynamics, this bit might disturb protein refolding. In addition, in the original construct, the C-terminal hydrophobic -141 LLFLY-sequence can probably form an association with the membrane or embed into it. Therefore, we made a truncated GLP-1R variant, comprising 24-132 residues only GLP-1R 24-132 abbreviated as R132 ( Fig. 1B and C); (iii) additionally, we removed the His-tag from the N-terminal of the GLP-1R, as the purication of the IBs could be accomplished by RP-HPLC, resulting in a protein "ready" for MS analysis and for exact concentration adjustment for a refolding reaction. On the other hand, the reduction of IBs and the conditions of the subsequent renaturation process (DTT, EDTA, Arg) did not allow Ni-IMAC purication by the His-tag, which necessitated the introduction of an additional dialysis step. Aer the refolding reaction, the purication of the folded protein could be made by IEX. Another aspect of this modi-cation was in line with the reported corticotrophin receptor purication protocol, where it was found that the His-tag interferes with the formation of the proper SS-bridge pattern. 15 The refolding reactions were performed by diluting the isolated IBs (unfolded GLP-1R in 6 M Gua HCl, 100 mM DTT) to a large volume in redox buffer as detailed by the original protocol. 10 Interestingly, we did not nd any benet of varying the concentration of L-Arg as outlined in the original GLP-1R protocol. Following renaturation, dialysis was required as the presence of L-Arg interferes with the downstream chromatography (Q-IEX, RP-HPLC). Aer dialysis, nearly 90% of the target protein was precipitated. The folded GLP-1R was puried from the soluble phase using Q-IEX and RP-HPLC aerwards. With this method, 0.05 mg truncated GLP-1R was puried from 70 mg IBs. We analyzed this nal product using mass spectrometry and by disulde bridge pattern analysis. Besides the proper molecular ion, however, the MS data revealed two additional proteins, as the GLP-1R was degraded during the renaturation processes, which raises the possibility of protein instability and course, reducing the overall production yield.
We searched for new production pathways due to the low and uncertain yield. The focus was to avoid IBs formation of the target protein as these would be difficult and cumbersome to handle (Fig. 2). The major problem with expressing multiple disulde bridges containing proteins in E. coli is the reductive environment of the cytoplasm. In contrast, the cytoplasms of Origami B and Shuffle strains are oxidative due to their mutations, 18,19 therefore the formation of disulde bridges in the cytoplasm was pursued. In addition, Shuffle (DE3) cells contain a cytoplasmic DsbC, which enhances the correct formation of the disulde bridge pattern. 19 To express proteins containing disulde bridges in E. coli, these strains were therefore considered appropriate. However, in the case of "not-so" globular/partially disordered regions containing protein ( Fig. 1), like GLP-1R, the direct expression of the target protein in these oxidative strains does not work. However, "problematic cases" can be absolved by using a suitable protein fusion tag at the N-terminal, like that of thioredoxin, DsbC, MBP, GST, Ubiquitin, SUMO, etc. 20 Along this line, we constructed a pET-based DNA-vector family, in which the target protein can be cloned to the C-terminal part of the fusion partner with the same restriction site, frame, and position. We tried to express these DNA constructs using different expression parameters (induction time/inductor concentration/ temperature, etc.) and different expression strains (Table 2). Summarizing these constructs and experiments, protein production could be divided into "successful" and "unsuccessful" cases. In all cases judged "unsuccessful", the fusion protein formed an inclusion body. On the other hand, in the "successful" cases the fusion protein remained in the cytoplasm in soluble form, so the rst purication step was performed directly aer the cell lysis. However, the latter cases did not necessarily mean that the corresponding disulde bridge pattern of the GLP-1R target protein was formed in the cytoplasm.  The TrxA fusion expression protocol turned out to be "unsuccessful". In every used expression strain, the fusion protein formed IBs. Therefore, thioredoxin did not exert its published chaperone activity in the cytoplasm of the Shuffle (DE3) and Origami B (DE3) cells. [17][18][19] Interestingly, during the subsequent refolding step, though the fusion protein remained in solution, aer thrombin cleavage, the GLP-1R was not detected by SDS-PAGE. This result indicated that the GLP-1R was not perfectly folded, 16 and thus, thrombin protease cleaved it. Similarly, the use of ubiquitin, SUMO, and GST labels was similarly unsuccessful.
The soluble, cytoplasmic productions of DsbC-and MBPfused GLP-1Rs were successful, but only using the BL21 and Shuffle strains. Aer harvesting these cells, the fusion protein was in the cytoplasmic phase, so the rst chromatographic purication step from the supernatant fraction was completed. This shows that the use of DsbC and MBP as fusion partners brings the target protein into the solution phase, which simplied the forthcoming purication steps, but did not presuppose the formation of a proper disulde bridge pattern, as shown in the case of thioredoxin.
When we examined the role of the three bacterial strains, it was surprising that the MBP-and DsbC-fused GLP-1R did not form IBs in the reductive cytoplasm of BL21 (DE3), suggesting that MBP and DsbC may have a solubilizing or chaperone effect. Aer the rst purication step (Ni-IMAC), the eluted fractions were analyzed by size-exclusion chromatography, and a large oligomeric form of the receptor was typically detected. Therefore, even though the latter two fusion partners could solubilize the GLP-1R, soluble aggregates or solvated IBs were formed (Fig. 3). Therefore, we introduced a renaturing "extra" step following the rst purication phase, and indeed, the subsequent size-exclusion chromatography showed a decrease in the oligomeric form and an increase in the monomeric form (Fig. 3A).
The need for the extra renaturing step was also supported by the analytical RP-HPLC chromatograms. Analysis of the R132, MBP-R132, and DsbC-R132 constructs revealed that the elution proles did change for the better ( Fig. 3B and C).
Previously, we have shown that co-expressed DsbC enhances the refolding efficiency. 17,18 In our experiments, the coexpressed isomerase was found aer the rst Ni-IMAC purication steps in all the eluted fractions, i.e., it binds unspecic to the matrix or this effect is caused by the activity of DsbC isomerase. It binds to its substrate (in this case, the target GLP-1R protein) as an enzyme does aer proteolytic cleavage. However, aer empty harvesting of the Shuffle cells, genomically encoded DsbC could not be eluted by Ni-IMAC. For this reason, renaturation was performed in a GSH/GSSG redox environment without DsbC. Aer refolding, the DsbC-fused GLP-1R was investigated by analytical RP-HPLC. Several intense peaks appeared on the chromatogram, which were identied as belonging to the fusion protein by SDS-PAGE. Presumably, this was caused by the formation of the proteins with different disulde bridge patterns.
The MBP-fused variant was puried aer the refolding by amylose affinity chromatography, followed by proteolytic  (Table 1), following Ni-IMAC chromatography without refolding (black line) and after refolding (gray line). The "extra" renaturing step introduced decreased the oligomeric form (7.5 ml) and increased the monomeric form (14.8 ml). Analytical RP-HPLC chromatograms of (B) the refolded R132; (C) MBP-R132 before (blue line) and after refolding (red line) and (D) that of the DsbC-R132 construct after refolding.
cleavage. The GLP-1R was puried by reverse Ni-affinity chromatography. Finally, the "pass-through" fraction contained the coveted product, i.e., the properly folded GLP-1R. This was puried by reverse-phase HPLC and subjected to mass spectrometric analysis (Fig. 4). The MS unquestionably proved the formation of disulde bonds (Mw calculated : 12857.12, Mw measured : 12857.01), and thus, the expected correct disulde pattern was determined by enzymatic methods followed by UPLC-MS analysis (Fig. 1). However, based on disulde bridge pattern analysis, two other disulde-bridge-patterned GLP-1Rs were also detected in the solution phase, indicating that the MBP-guided protein refolding was imperfect, as it did not result in a single product. The produced amount of the native disul-de patterned target protein was 0.5 mg made from 6 L of nutrient culture. Therefore, compared to the other expression yields, a subtle but signicant increase was observed. In addition, the advantage of this method is undoubtedly its simpli-ed, cost-effective, and easy-to-use protocol.
To demonstrate that in the absence of MBP-guided refolding the proper SS-pairing of GLP-1R is unlikely to be obtained in a reproducible and large-scale manner, the chemical synthesis of GLP-1R using SPPS and NCL was completed. Due to the length and the difficulty of the 108-amino-acid-containing GLP-1R protein sequence, the step-by-step manual synthesis of the protein would have been inefficient and time-consuming, so we decided to synthesize the GLP-1 receptor protein by a combination of manual and automated solid-phase peptide synthesis completed with native chemical ligation. [21][22][23][24] The designed peptide fragments were fully compatible with the native chemical ligation procedure and were synthesized using a CEM® Liberty Blue microwave-assisted automated peptide synthesizer.
The rst polypeptide A thioester derivative was synthesized using manual SPPS and Boc chemistry. The C-terminal rst amino acid, Phe (F), was coupled to the free sulydryl group of cysteine. Reaching full length, the thioester was detached from the resin using HF, and the crude polypeptide A was puried by C18 RP-HPLC. Polypeptide B was made on SEA resin, 23 with the Cys residues Acm side chains protected. Exploiting the advantages of SEA chemistry, the crude SEA-(ON) polypeptide B was oxidized with ammonium hydrogen carbonate (0.1 M) to obtain the crude SEA-(OFF) peptide. Note that the SEA-(OFF) peptide B is unreactive at its C-terminal, which avoids the formation of ligation side products during chemical ligation. 24 Before the chemical ligation of A to B, the Acm protecting groups were removed by Ag(OTf) in TFA/anisole (4 C and 4 h) and the crude Acm-deprotected polypeptide B was puried by C18 RP-HPLC. The chemical ligation of the thioester of polypeptide A and SEA-(OFF) polypeptide B were ligated in a Sorensen buffer (pH 7.4) in the presence of 3% thiophenol (40 C for 24 h) ( Fig. 4 and 5) resulting in the 61-amino-acid-containing N-terminal fragment (polypeptide AB) of GLP-1R, puried by C18 RP-HPLC (Fig. 5).
Polypeptide C was synthesized on SEA resin by applying Fmoc/tBu chemistry and Acm protection for Cys residues. The active SEA-(ON) carboxyl-terminal of the crude peptide was converted into a more reactive MPA thioester in the presence of tris(2-carboxyethyl)phosphine (1000 eq.) in slightly basic media (0.1 M Sorensen buffer pH 7.4) at 40 C for 24 h. 24 The  crude Acm-protected MPA thioester was puried by C18 RP-HPLC. The C-terminal part of the C-terminal fragment, polypeptide D, was made as the method described above for polypeptide C. The chemical ligation of the Acm-protected-MPA thioester of polypeptide C and D resulted in the 47amino-acid-long C-terminal polypeptide CD (Fig. 5). Ligation was carried out in slightly basic media (0.1 M Sorensen buffer, pH 7.4) in the presence of thiophenol at 40 C for 24 h (Fig. 6  and 7). The C-terminal fragment of GLP-1R was puried by C18 RP-HPLC. Cys (Acm) deprotection of the C-terminal fragment, polypeptide CD, was completed as described above (Ag(OTf) in TFA/anisole, 4 h at 4 C) and then puried (by C18 RP-HPLC) (Fig. 5).

Self-guided SS-pairing and GLP-1R folding
As the last step of the chemical or biotechnological synthesis, the nal goal was to obtain the correct disulde pattern between the thiol groups by the oxidation of the 108-mer linear protein domain. Because of the presence of six SH-groups (C 46(A) , C 62(B) , C 71(C) , C 85(D) , C 104(E) , and C 125(F) ), the formation of three intramolecular SS-bonds was expected (Fig. 7). For molecules that, under native-like conditions, naturally fold in conformations ensuring an effective pairing of the right disulde bridge  pattern, chemically driven approaches to oxidize cysteine may not be required. To obtain the desired disulde pattern, the 108-amino-acids-containing linear protein was oxidized under various reaction conditions (Table 3). During the oxidation, most parts of the dissolved protein precipitated. Aer dissolution, the products were investigated by RP-HPLC, and the disulde topology was determined following enzymatic digestion and by MS-MS measurements. The earlier eluting fraction contained mainly the unnatural C1-C6, C2-C3, and C4-C5 disulde patterns. The main peaks contained numerous disul-des (C1-C3*, C1-C4, C2-C3, C2-C4, C2-C5*, C3-C4, C3-C6, C4-C5, C4-C6*, C5-C6) including the desired natural ones, as marked with asterisks, but unfortunately, in an inseparable manner. Due to the high chromatographic similarity, even RP-HPLC columns having the best plate number could allow only a negligible resolution between the numerous disulde isomers. In addition to the separation problems, the extremely wrong solubility of the oxidized protein made this approach unsuccessful.

Conclusions
Protein domains of 10-15 kDa size could be obtained both by expression in E. coli and/or by ligating SPPS-made suitable fragments. Both strategies could lead to the desired product and the choice between the alternative methods seems to be optional. The presence of multiple disulde bridges within the protein is usually not above the capability of these techniques. Interestingly, in the current GLP-1R case, unlike in its chemical synthesis, the biotechnological expression of the receptor protein resulted in the desired, correctly folded 3D structure only.
To the best of our knowledge, this seldom happens for intact domains and for complete globular proteins. However, for truncated/designed macromolecules and their fragments, such as GLP-1R, this scenario is to be expected more oen and thus, suitable fusion partners (e.g., MBP) can be used to get help for proper Cys pairing and cystine formation. Therefore, the application of carrier proteins/chaperones could be necessary, even if the chemical preparation and/or bacterial expression of the unfolded protein is successful. An additional advantage of using an appropriate fusion partner is that it could improve the solubilizing properties of the truncated protein and thus facilitate the kinetics of the proper refolding.

Experimental section
Chemical synthesis of the GLP-1R protein Due to the difficulty and the length of the sequence, the synthesis of the 108-amino-acids-containing GLP-1 peptide receptor was carried out by native chemical ligation. 23,24 The designed fragments compatible with native chemical ligation were synthesized using solid-phase peptide synthesis (SPPS) with a Fmoc/tBu strategy applying a CEM ® microwave-assisted fully automated peptide synthesizer.
The synthesis of CNRTFDEYACWPDGEPGSFVNVS-SEA(OFF) (polypeptide B) was completed by using an SPPS/CEM ® fully automated microwave-assisted peptide synthesizer, applying Fmoc/tBu chemistry using SEA resin (0.13 mmol g À1 ) and Acm side-chain protection for the Cys residues. The crude SEA-(ON) peptide was oxidized using 0.1 M NH 4 HCO 3 to obtain the crude SEA-(OFF) peptide. The Acm side-chain protection was removed by using Ag(OTf) (50 eq.) in TFA (10 mg ml À1 ) in the presence of anisole at 4 C for 4 h (yield aer purication: 15%). The chemical ligation of peptide thioester (Polypeptide A) and SEA-(OFF) peptide (Polypeptide B) was carried out in the presence of thiophenol (3%) in 0.1 M Sorensen buffer, pH 7.4 (6 M guanidine hydrochloride), at 40 C for 24 h (overall yield aer puri-cation: 21%).
The chemical ligation of the Acm-protected peptide-MPA thioester (polypeptide C) and the peptide amide (polypeptide D) was carried out in the presence of thiophenol (3%) in 0.1 M Sorensen buffer pH 7.4 (6 M guanidine hydrochloride), at 40 C for 24 h (yield: 41%). Acm protection of the "C"-terminal peptide was removed by Ag(OTf) (50 eq.) in TFA (10 mg ml À1 ) in the presence of anisole at 4 C for 4 h (overall yield: 22%). The chemical ligation of the N-terminal SEA-(OFF) peptide and the Acm-deprotected "C"-terminal peptide amide was carried out in the presence of thiophenol (3%) in 0.1 M Sorensen buffer, pH 7.4, (6 M guanidine hydrochloride), 0.2 M TCEP$HCl at 40 C for 96 h (yield aer purication: 19%).

Oxidation of GLP-1R made by chemical and recombinant synthesis and identication of the SS-bridges by MS
To obtain the desired disulde bridges, the puried 108amino-acids-containing linear GLP-1R peptide obtained by native chemical ligation and the protein obtained by recombinant synthesis were oxidized using various oxidation conditions (see Table 3 in the Results section). Because of the presence of 6 cysteine residues (C 23(A) , C 39(B) , C 48(C) , C 62(D) , C 81(E) , and C 103(F) ), the formation of three disulde bonds was expected. For disulde-bridge identication of the protein (GLP1), an enzymatic digestion method combined with mass spectrometry was used. Based on the sequence of the protein, the method was planned to produce a mixture of peptide fragments containing only one disulde bond. Based on the sequence of the protein, a mixture of two enzymes (trypsin and chymotrypsin) was found to be a good settling. Fragments linked together through disulde bridges were separated and analyzed by capillary reverse-phase UPLC coupled to the mass spectrometer. These peptides could be identied based on their unique masses and tandem mass spectrometric fragments. For searching for possible linked fragments, the MS-Bridge soware was used (https://prospector.ucsf.edu/ prospector/mshome.htm).

DNA constructions
For direct expression, the GLP-1R domain (R132) was ligated between the NdeI and BamHI sites of the pET-32b vector. Between the NdeI and BamHI restriction site of the pET-32b vector, the cDNA of each fusion protein was ligated with an Nterminal His-tag and a C-terminal thrombin cleavage site.

Expression, purication, and refolding of the IBs of GLP-1R
The expression targeted to IBs formation (direct expression) was performed in 2YT media, at 37 C and with 180 rpm shaking. At OD600 ¼ 1, the expression was induced with 1 mM IPTG for 5 h. The expression of DsbC-and MBP-fused GLP-1R was performed at 2YT and 180 rpm. The expression was induced with 0.2 mM IPTG for 12 h at 18 C.
Aer the cell lysis, the cytoplasmic fraction was removed by centrifugation. The IBs-containing pellet was washed by NaPi buffer 3 times. Aer the last centrifugation step, the pellet was solvated with 20 ml 6 M guanidine hydrochloride and 50 mM DTT, and a 12 h long incubation was performed at 37 C. The pellet was removed by centrifugation, and the solvated IBs were puried by C4 RP-HPLC. The eluted fraction was lyophilized and solvated by 4 M Gua HCl at pH 8.5 at 1 mg ml À1 concentration and the refolding reaction was performed: a small amount was dosed to the refolding buffer (50 mM Tris, 150 mM NaCl, 10 mM GSH, 1 mM GSSG, 10 mM EDTA pH 8.5) up to the 20 mg ml À1 fusion protein concentration at 18-20 C for 48 h, with mixing at 250 rpm with a magnetic stirrer. The pellet was then removed by centrifugation, and buffer exchange was performed by dialysis (14000 rpm, 4 C, 30 min) (50 mM Tris, 50 mM NaCl). The eluted fraction was puried by C4 RP-HPLC.

Expression, purication, and refolding of MBP-and DsbCfused GLP-1R
Aer the cell lysis, the centrifuged cytoplasmic fraction of MBP-and DsbC-fused GLP-1R was puried by Ni-IMAC chromatography according to the manufacturer's protocol. A dialysis step was performed to remove the imidazole and to reduce the protein (50 mM Tris HCl, 150 mM NaCl, 5 mM DTT, pH 8.5). Aer the A280 concentration measurement, the refolding reaction was performed: a small amount was dosed to the refolding buffer (50 mM Tris, 50 mM NaCl, 10 mM GSH, 1 mM GSSG, 10 mM EDTA pH 8.5) up to the 20 mg ml À1 fusion protein concentration at 18-20 C for 48 h, with mixing at 250 rpm with a magnetic stirrer. The pellet was then removed by centrifugation, and buffer exchange was performed by dialysis (14000 rpm, 4 C, 30 min) (50 mM Tris, 50 mM NaCl). The fusion protein was puried and concentrated by Q-IEX chromatography. The eluted fraction was immediately cleaved with thrombin. Aer the incubation time, a second Ni-IMAC was performed, and the target GLP-1R passed through the column. This fraction was further puried by C4 RP-HPLC, which led to two major products (see Fig. 8, peaks 1 and 3) having the correct molecular mass. According to the mass spectrometrical investigations combined with enzymatic digestion, the disulde patterns of the two isolated proteins proved to be C1-C3, C2-C5, and C4-C6, peak 3, (the natural one), and C1-C2, C3-C4, and C5-C6, peak 1, an unnatural isomer.

Conflicts of interest
There are no conicts to declare.