The structure of human dermatan sulfate epimerase 1 emphasizes the importance of C5-epimerization of glucuronic acid in higher organisms

Structural studies of human DS-epi1 suggests a new catalytic isomerization mechanism and reveals remarkable similarities to bacterial proteins.


Introduction
Out of the four major classes of biological macromolecules: nucleic acids, proteins, lipids and carbohydrates, the latter is arguably the most complex and least understood. Notably, the group of linear, heterogeneous polysaccharides called glycosaminoglycans (GAGs) are synthesized in a non-template-driven manner, but still carry distinct sequences crucial for biological interactions. [1][2][3] The GAGs are most oen found as part of proteoglycans, consisting of one or more GAG chains covalently attached to a protein core. One of the most common GAG types, chondroitin/dermatan sulfate (CS/DS), is built up by repeating disaccharide units of D-glucuronic acid (GlcA) and N-acetyl-Dgalactosamine (GalNAc). The CS/DS polymer can be modied by epimerization and sulfation, introducing the potential for vast structural diversity (Fig. 1). 4 The two human dermatan sulfate epimerases DS-epi1 and DS-epi2, essential for the epimerization of position 5 of GlcA residues to form L-iduronic acid (IdoA), were previously identied, cloned, expressed, puried and functionally evaluated by our group. 5, 6 We, and others, have previously shown that the patterns of epimerization and sulfation of CS/DS is of critical importance for a range of biological activities. 7 For example, patients diagnosed with Ehlers-Danlos syndrome caused by loss-of-function mutations in DS-epi1 suffer from multiple organ disorders. [8][9][10][11] Furthermore, IdoA-containing motifs are important in processes such as control of collagen bril formation, cancer development, thrombin inactivation through heparin cofactor II (HCII), growth factor interactions as well as for cell migration. [12][13][14][15][16][17] The formation of IdoA in CS/DS, by epimerization of GlcA on the polymer level, is an interesting exception to the biosynthesis of most other polysaccharides, which are synthesized from a pool of activated monosaccharide donors. In fact, to date there are only three known examples of post polymerization epimerization, i.e. the biosyntheses of alginate, heparin/heparan sulfate and CS/DS. 18 Despite the resemblances on the functional level, the amino acid sequences of the epimerases show no similarity on the primary level. By performing domain prediction and prole-prole alignments with sequence proles of proteins from PDB, COG, Pfam and SCOP, we have previously shown that DS-epi1 has no domain similarity to the alginate and heparan sulfate epimerases but is remotely related to polysaccharide-degrading bacterial enzymes. 19 These ndings suggested the location of the active site and catalytic amino acids of DS-epi1 and were further supported by site-directed mutagenesis studies. However, so far, no experimental structural data has been presented for DS-epi1 or DS-epi2.
In this study we present the three-dimensional structure of human N-glycosylated DS-epi1, solved by a combination of macromolecular crystallography and targeted cross-linking mass spectrometry (TX-MS). 20 The structure reveals a high similarity to proteins from several families of bacterial polysaccharide lyases. By using a combination of in silico docking and molecular dynamics simulations, we also propose a novel catalytic mechanism for DS-epi1.

Results
Human DS-epi1 is built up by an (a/a) 6 -toroid domain connected to a b-supersandwich domain In a previous study, we revealed that the rst 755 amino acids of DS-epi1 are necessary for catalytic activity. 21 Based on those results, we used ve constructs of $755 amino acids, spanning the Cterminal domain of unknown function, for protein expression attempts in mammalian HEK293 GnTI(À) cells. 22 Only one construct, corresponding to amino acids 23-775 (DS-epi1 23-775), yielded diffraction-quality crystals, but still displayed a catalytic prole similar to that of the full-length luminal protein (DS-epi1 23-894), with only slight variations in K M and V max (ESI Fig. 1A †). The truncated soluble DS-epi1 23-775 protein had a molecular weight of 89.2 kDa and was pure, monomeric and monodispersed (ESI Fig. 1B and C †). We have previously shown that DS-epi1 forms homodimers in vivo, 23 and therefore wanted to evaluate if dimerization occurs in a concentration-dependent manner. Both full-length luminal DS-epi1 23-894 and truncated DS-epi1 23-775 were analyzed using size-exclusion chromatography at different concentrations between 0.1-12 mg ml À1 (ESI Fig. 1D and E †), but no sign of dimerization could be seen at any concentration. Crystals of DS-epi1 23-775 grew aer approximately one week of incubation at 22 C (ESI Fig. 1F †). The single-wavelength anomalous dispersion (SAD) method, using the anomalous signal from manganese, produced a partial model of DS-epi1 (see Table 1 for data collection and renement statistics). Using the derived model, a more complete crystallographic structure was built using a 2.41Å dataset and thereaer rened to an R factor of 17.6% (R free 20.9%). Each unit cell contained eight protein molecules and had a relatively high solvent content of 72%. A majority of the structure could be built, except nine C-terminal residues which were not visible in the electron density, revealing two large domains connected by an 18 amino acid linker loop, and a C-terminal tail extending back to the N-terminus ( Fig. 2A). The linker loop contains three adjacent prolines (Pro382-384), likely contributing to exibility of the two domains. Accordingly, mutagenesis of Pro383 to alanine yielded an enzymatically inactive product which was expressed at very low levels (ESI Fig. 2 †). The N-terminal domain was clearly dened in the electron density, however, with B factor values around 50. In contrast, the C-terminally located domain was overall more disordered, and specic loop regions, especially amino acids 507-515, were poorly detailed in the densities with substantially higher B-factors. The N-terminal domain (Fig. 2B) was identied as an (a/a) 6 25 and each secondary element was labelled with unique identiers (Fig. 2D). On the gene level, DS-epi1 is encoded by the DSE gene, consisting of six exons with the coding sequence extending from exon 2-6. As revealed by the crystal structure, helices a1-a4 are encoded by exon 2, a5-a7 by exon 3, a8-a10 by exon 4, a11-a13 by exon 5 (i.e. the N-terminal alpha-toroid domain is encoded by exon 2-5) whereas the nal part of the polypeptide, including the b-sandwich domain and the C-terminal noncrystallized part, is encoded by exon 6.
A highly conserved core of DS-epi1 harbors a manganesebinding site and the suggested active site Situated in the central beta sandwich domain, a strong peak was observed in the initial SAD electron density map. Based on the previous knowledge of the necessity of Mn 2+ for catalytic activity, 26 Fig. 3 †).
The suggested active site of DS-epi1 is found buried deeply between the alpha-toroid and the beta-sandwich domains. Using a list of eukaryotic orthologs of DS-epi1 (ESI Table 1 †), we analyzed the amino acid conservation of DS-epi1 using the ConSurf server. 28 The results (Fig. 3B) revealed that the active site cle, as well as parts of the C-terminal tail, were highly conserved, further supporting the location of the active site.
DS-epi1 is active over a pH range from 5 to 7, with an optimum of 5.5. 26 Using the crystal structure as input, we investigated how different pH values affected the electrostatic surface potential of DS-epi1. The active site was found to harbor an overall negative electrostatic potential at pH 7.0, which gradually changed into a positive charge as the pH was decreased (Fig. 3C).

Docking studies with a chondroitin octasaccharide
The N-terminal domain and the central domain of DSepimerase 1 form a deep, curved cle with a length of approximately 40Å, that constitutes the carbohydrate binding domain. Despite extensive co-crystallization and soaking attempts, using different crystallization conditions, substrate analogues and point mutants, no co-crystal was obtained. Hence, to study the binding and interactions of DS-epi1 with a substrate oligosaccharide, we docked a chondroitin octasaccharide in the presumed active site. In addition to the regular six degrees of freedom of a rigid molecule, an octameric oligosaccharide has 14 rotatable glycosidic torsions, making it a complicated problem for any docking program. Therefore, chondroitin diand tetrasaccharides were placed at different positions along the carbohydrate binding domain followed by molecular dynamics simulations. A tetrasaccharide found a stable binding pose adjacent to the catalytic site and this tetrasaccharide was extended into and past the catalytic site, one carbohydrate at a time, followed by new molecular dynamics simulations. The carbohydrate units were numbered, with reference from the GlcA unit situated at the catalytic site, from the reducing end (+3) to the nonreducing end (À4) (Fig. 4A).
The resulting ligand conformation featured the GalNAc adjacent to the reacting GlcA, i.e. position À1, with a hydrophobic interaction with Trp98, similar to that of galactose recognition in galectins 29 (see ESI Video 1 and Coordinate File 1 †). In line with the docking results, a Trp98Ala mutant had a reduced activity of almost 90% (ESI Fig. 3 †). The carboxyl groups of all GlcA were pointing towards the water phase, except for the one undergoing isomerisation, which was hydrogen bonded by protonated His205 and by Tyr261, supported by a hydrogen bond from Tyr473. In close proximity to the active site, His450, previously thought to act as a general base for abstraction of the C5 hydrogen of the uronic acid, formed hydrogen bonds with the carboxylate group of GlcA via a water bridge.

Theozyme model of the isomerization mechanism
A theozyme is a QM theoretical model of an enzyme, only including the parts of the protein that stabilizes the transition state. 30 Molecular dynamics calculations on DS-epi1 with an octameric oligosaccharide indicated that the isomerized GlcA carboxylic functional group was oen hydrogen bonded to Tyr261 and to protonated His205. In order to investigate the mechanism of isomerization of GlcA, a theozyme model of DS-epi1 was constructed from the core amino acids of isomerization, Tyr261, Tyr473 and His205, only including the C-alpha atoms from the protein backbone. A disaccharide from the reacting GlcA and the adjacent GalNAc in position À1 was also included in the theozyme model. All fragments were placed in accordance with a frame of a molecular dynamics trajectory which featured all the important hydrogen bonds, Tyr473 to Tyr261, Tyr261 to GlcA and protonated His205 to GlcA as well as a short distance from the hydrogen atom at the beta position of the carboxylate of GlcA to the oxygen of Tyr261.
With the intention of constraining the fragments to allowed positions in the full protein, locked coordinates were introduced for all C-alpha and C-beta atoms of the amino acids as well as for the two oxygens that formed glycosidic bonds to the deleted parts of the octamer. These rather rigid restrictions may cause high energies since both the protein and the carbohydrate are more exible in the full protein than in the model. Additionally, in order for the isomerization of GlcA to occur, the His-GlcA hydrogen bond has to be charge separated to facilitate transfer of negative charge to Tyr261.
QM geometry optimizations of the theozyme with bound GalNAcb1-4GlcA was performed for complex (S) which was modied in place to both the intermediary enol (I) and to the IdoA product (P). Both complexes (I and P) were subjected to QM geometry optimizations with the previously described constraints. A QM transition state search using the linear synchronous transit search method between the optimized (S) and (I) complexes was performed, resulting in transition state (T1, see ESI Video 2 †). A corresponding transition state search was performed between the optimized (I) and (P1) complexes with a catalyzing water molecule added to both structures, resulting in transition state (T2). Vibrational analysis of (T1) and (T2) yielded only one large imaginary frequency for each, À1280.7 cm À1 and À1664.2 cm À1 respectively. For both transition states, animation of the imaginary frequency corresponded well to the reaction coordinate of the expected sigmatropic reaction.

Proposed mechanism of DS-epi1
From the theozyme model we propose that the epimerization reaction starts by an electron shuttle through the carboxylic acid unit followed by proton abstraction from Tyr261 ( Fig. 4B and C). The negative charge of the formed tyrosine anion is stabilized by a hydrogen bond from Tyr473 and provides a sufficiently strong base to abstract the acidic H5-proton of the GlcA unit in (T1). Proton abstraction from the charged His205 facilitates formation of a neutral enol (I). Both His205 and Tyr261 have previously been shown to be critical for catalytic activity. 19 Mutagenesis of Tyr473 into an alanine similarly yielded an inactive protein (ESI Fig. 3 †). In order to rule out that the differences in activity observed for the W98A and Y473A mutants were not due to protein misfolding, we performed structural modeling of the two mutants using a deep learningbased structure prediction implemented in trRosetta (ESI Fig. 4 †). 31 Only very small differences (RMSDs < 0.5Å) were seen between the mutants and the wild-type protein, suggesting that the loss of activity was caused by disruption of catalytic properties and not incorrect folding. The top side of the enol is oriented towards the surrounding water layer and we propose that the enol can abstract a proton from one water molecule that in turn abstract a proton from the enol (T2), thus reforming the carboxylic acid unit (P). Alternatively, Asp147, located on the opposite side of the sugar plane, relative to Tyr473, might contribute to reprotonation. Mutation of Asp147 into an alanine reduced the activity by approximately 75% (ESI Fig. 3 †).

N-Glycosylation stabilizes the structure of DS-epi1
For initial crystallization attempts, we expressed DS-epi1 in HEK293 cells which yielded natively glycosylated protein that crystallized and diffracted around 2.6Å. To increase protein homogeneity and crystal quality, DS-epi1 was expressed in HEK293 GnTI(À) cells (devoid of N-acetylglucosaminyltransferase I (GnTI)) 22 which produce protein with only highmannose type glycans. The high-mannose crystals diffracted at slightly higher resolution (2.4Å) but no structural difference was observed between structures from the two datasets. For both structures, we found densities extending outwards from Asn 183, 336, 642 and 648, which could partly be modeled as Nglycans (Fig. 5A). The density at Asn336 revealed a clear sign of a branching glycan with two arms diverging from the third monosaccharide in a small ra between the alpha and beta domains. Five saccharide molecules were tted in the electron density extending from ND2 of Asn336: Mana1-6(Mana1-3) Manb1-4GlcNAcb1-4GlcNAcb1-Asn (Fig. 5B). In an attempt to even further improve the crystal packing, we attempted to express recombinant DS-epi1 in HEK293 GlycoDelete cells 32 (which are GnTI(À) and engineered to express an endoT glycosidase in late Golgi) to produce protein with even more homogenous and truncated N-glycans. However, analysis of genomic DNA from the DS-epi1-expressing HEK293 GlycoDelete cells (aer transfection and puromycin selection) revealed that the endoT gene, responsible for cleaving off high mannose type N-glycans in late Golgi in the engineered cell line, was missing (ESI Fig. 1H †).
We have previously characterized the N-glycosylation pattern of DS-epi1 by site-directed mutagenesis studies, where N-glycosylation was shown to be crucial for enzymatic activity. 19 In those experiments, point mutations were introduced based on in silico predictions by the NetNGlyc web server, which identied four Nglycans at positions Asn 183, 336, 642 and 648. In order to yield more information about the glycan composition at each site, we performed an LC-MS/MS glycopeptide analysis of natively glycosylated full-length DS-epi1 from HEK293 cells. Glyco sites were identied based on observed fragment ions and calculated mass differences between non-modied and glycosylated peptides. In addition to glycosylations on the previously identied sites, we also observed a novel glycosite at Asn411. Both oligomannose and complex/hybrid structures were observed ( Fig. 5C and ESI Table  2 †) and a relatively high degree of glycan heterogeneity was observed for each site. The relative abundances of different glycoforms were calculated based on the observed extracted ion chromatograms and it was found that Asn 183, 336 and 642 were all fully glycosylated based on the absence of a non-glycosylated peptide. Only $2.5% of all identied Asn411-containing peptides were found to be glycosylated, explaining why the modication has not previously been discovered. Additionally, the unmodied peptide of Asn648 was too small to be detected by MS analysis, thereby disabling calculations of site occupancy. All glycopeptides from the Asn336 glyco site were found to be modied with highmannose type glycans and the presence of high-mannose glycans was also conrmed by endoH degradation (ESI Fig. 1G †).
The structure of full-length DS-epi1 The two human dermatan sulfate epimerases, DS-epi1 (958 amino acids) and DS-epi2 (1212 amino acids), are predicted to be double-pass transmembrane proteins, with the active domains extending out into the Golgi lumen. They both share an N-terminal "epimerase" domain (51% identity between amino acids 43-673 and 62-691 for DS-epi1 and -2, respectively), while the C-termini bear no similarity to each other (ESI Fig. 5 †). DS-epi2 contains a predicted sulfotransferase domain (amino acids 853-1202, Pfam PF00685) on the C-terminus, whereas no domain similarity to known structures has so far been reported for the last $280 amino acids of DS-epi1. Since the crystal structure of DS-epi1 does not include the full Cterminus of the protein, we aimed to extend the model to the complete luminal structure (amino acids 23-894) using cross- linking mass spectrometry. For this purpose, we used the targeted cross-linking mass spectrometry (TX-MS) 20 with emphasis on intra cross-links to guide the structural modeling. Two different MS acquisition data were collected and analyzed; high resolution MS1 (hrMS1) and data dependent acquisition (DDA). The identied cross-links were used as experimental constraints to lter out the conformational space of generated models by Rosetta soware suite. 33 Accordingly, 24 intra crosslinks were identied (ESI Fig. 6 and Table 3 †). Rosetta de novo 34 and RosettaCM 35 protocols were used to explore the vast conformational space and represented several candidates for the C-terminus, which were then ltered out by the crosslinking experimental constraints (Fig. 6A). While 12 of these cross-links validated the crystal structure input, 12 were used to provide a model of the C-terminal domain (Fig. 6B). The Cterminal domain of all models that fullled the cross-link restraints (226 in total, ESI Table 4 †) were compared to the full PDB (155625 models, downloaded on 2-Oct-19) using a local installation of DaliLite.v5, 36 but no signicant similarities was found for any model (ESI Table 5 †), in agreement with primary sequence predictions. Finally, a hypothetical full-length protein including the transmembrane domain and complete N-glycans was constructed by glycan building, membrane construction/ docking and energy minimization using the CHARMM web server (http://www.charmm-gui.org) (ESI Fig. 7 †).

DS-epimerase expression among species and identication of bacterial homologs
To identify species with orthologs to human DS-epi1 and its only human paralog, DS-epi2, protein BLAST searches were performed against non-redundant protein sequence databases (GenBank CDS translations, PDB, SwissProt, PIR and PRF). The vast majority of DS-epi orthologs identied were predicted proteins based on automated computational analyses of genomic sequences, so all deviances where only one of the epimerases were identied in a species were carefully reviewed. The search identied 274 species expressing DS-epi1 and/or DS-epi2, all of which were identied to be metazoans (ESI Fig. 8, Tables 1 and 6 †). Out of all species, Chordata was shown to represent the major phylum (98%, 264 species), with the remaining species belonging to the phyla Hemichordata (1), Echinodermata (3) and Cnidaria (6), all of which were unique for DS-epi2.
In order to identify structural conservation of DS-epi1 among other proteins and species, we used the Dali server 37 with the crystal structure of DS-epi1 as input. For all the structures in the PDB, DS-epi1 did not show any signicant similarity to any eukaryotic protein (ESI Table 7 †). Among the top hits when comparing the overall folds of the proteins were bacterial lyases belonging to polysaccharide lyase family 8, such as alginate lyase from Agrobacterium fabrum (PDB: 3A0O), heparinase II from Pedobacter heparinus (PDB: 2FUQ), chondroitinase AC from Paenarthrobacter aurescens (PDB: 1RWG), hyaluronate lyase from Streptococcus pneumoniae (PDB: 1W3Y) and heparinase III from Pedobacter heparinus (PDB: 4MMH) (Fig. 7A). Moreover, the N-terminal alpha helix domain of DS-epi1 was recognized as Pfam domain DUF4962, only found in dermatansulfate epimerases from mammals and polysaccharide lyases from Gram-negative bacteria. Several of the active site histidines and tyrosine were conserved in the lyases. For example, in heparinase II, distinct similarities were found both in the active site, but also in the metal binding site (Fig. 7B). None of the bacterial proteins could be identied when using only the primary sequence of DS-epi1. We also performed BLAST searches using the sequence of DS-epi2 from more primitive non-chordates, but no further proteins were identied.

Discussion
An experimentally supported model of the full-length Golgi luminal DS-epi1 was created, where 80% of the 958 amino acidprotein was solved by macromolecular crystallography and the rest using targeted cross-linking mass spectrometry (TX-MS) experiments and Rosetta modeling. The combination of macromolecular crystallography and TX-MS provides a method to solve complete structures of larger proteins where only part of the structure can be solved by crystallography experiments. At the same time, it is useful to conrm similarities between protein conformations in solution and crystal structures, which may give new information of kinetic conformational changes. The demonstration that TX-MS experiments in combination with macromolecular crystallography are feasible for structural studies is important as this approach can also be used in studies of interactions of other biosynthetic proteins and complex organizers in cells. The full-length model revealed a three-domain protein where the rst two domains, the a and b domains, harbor the main enzymatic function. The third, Cterminal, domain is of unknown function. However, we have previously described that at least $60 of the rst amino acids of the domain need to be included in recombinant constructs in order to yield an enzymatically active protein. 21 The crystallographic structure reveals that the rst part of the domain consists of a $30 amino acid long alpha helix which connects and possibly stabilizes the a and b domains. The structure of the C-terminal domain is most likely unique to DS-epi1 since no similar domains could be identied neither based on the primary sequence nor on the structural level. One possible function of the C-terminal domain is to enable homo-and heteromerization of the protein, which has previously been observed in FRET experiments and co-immunoprecipitations. 23 However, since no dimerization was observed in SEC experiments with the full-length protein, it might be that an additional protein is necessary for dimerization, or that the Golgi micromilieu is signicantly different from the in vitro experiments we have performed.
Musculocontractural Ehlers-Danlos syndrome (mcEDS) is a rare disease caused by mutations in DS-epi1 or the IdoA-specic enzyme dermatan 4-O-sulfotransferase 1. To this date, only eight mcEDS patients with mutations in the DSE gene have been described in the literature. 10 In order to investigate the potential impact of the mutations on the protein structure, we looked closer at the three patients carrying missense mutations (excluding truncating and frameshi variants). Two of the mutations, Arg267Gly 9 and Ser268Leu, 8 are located centrally in helix a9 of the (a/a) 6 -toroid domain, whereas the third, His588Arg, 11 is found in the b10 strand of the b-supersandwich domain, at a distance to the active site of approximately 15 and 20Å for Arg267/Ser268 and His588, respectively (ESI Fig. 5 †). Both locations are in line with previous reports showing that a large proportion of disease-associated mutations are found buried in protein cores, where they may cause destabilization of the protein structure. 38 When looking at the electron density, it is clear that the Ns of His588 hydrogen bonds to the backbonecarbonyl oxygen of Phe479. However, no clear amino acid interactions could be seen for Arg267 or Ser268. In the original reports describing the mcEDS mutations Arg267Gly and Ser268Leu, the authors describe marked reduction of IdoAcontaining disaccharides in the affected patients, but not complete absence. Based on published results from the DS-epi1 and DS-epi1/2 knockout mice, it is likely that the residual IdoAresidues are the product of DS-epi2 activity. 39,40 However, it cannot be excluded that some activity still remains in the DS-epi1 mutants.
In this study, we propose a new catalytic mechanism of DS-epi1, involving the catalytic triad His205, Tyr261 and Tyr473. Both new and previous experiments, in vitro and in vivo, have conrmed one or more of the suggested catalytic amino acids to be crucial for activity, which is further supported by a deeply conserved core including the catalytic site, identied in the crystal structure. 19,41 Heparan sulfate (HS) is also modied by a GlcA-converting epimerase on the polymer level. In contrast to the two human DS-epimerases, only one human HS epimerase (Glce) exists, acting on substrates containing N-sulfated hexosamine residues. Crystal structures of both the zebrash and the human Glce protein have been solved and the protein has been shown to exist as a dimer containing two catalytic sites. 42,43 Each monomeric unit consists of three domains: an N-terminal bhairpin domain, a central b-sandwich domain and a C-terminal (a/a) 4 -barrel domain. The substrate binding pocket is situated in a positively charged (at pH 7) cle made up mainly of the (a/ a) 4 -barrel domain. No similarities can be identied on the primary sequence level between Glce and DS-epi1, but structurally, the C-terminal (a/a) 4 -barrel and central b-sandwich domains of Glce resembles truncated forms of the N-terminal (a/a) 6 -toroid and central b-supersandwich domains of DS-epi1. However, the N-terminal b-hairpin domain of Glce is missing in DS-epi1, the C-terminal domain of DS-epi1 is not present in Glce and unlike DS-epi1, the metal binding site (Ca 2+ ) is located far away from the active site in Glce. Additionally, the double tyrosine motif is located opposite to the b-sandwich domain in the substrate-binding groove, contrary to the location in DS-epi1; the pH optima are different ($7 for Glce and $5.5 for DS-epi1 (ref. 26 )); although N-glycosylated, the number and distribution of glycans is different for Glce. Still, the overall shape, as well as the general organization of the active site amino acids, of DS-epimerase 1 is similar to the human Glce (Fig. 7A), and an analogous arrangement of two tyrosine units of Glce, i.e. Y578 and Y560, indicates a similar mechanism. The same arrangement is also found in heparinase II (PDB: 2FUQ) and alginate lyase (PDB: 3A0O). We propose that this double tyrosine motif is a general feature that may be found in other lyases and epimerases, and that minute changes in the active sites can give signicant changes in enzyme activity, i.e. from a lyase to an epimerase.
In close vicinity to the active site, His452, Glu470 and Asn481 coordinates a manganese ion. Mutagenesis experiments revealed that all three amino acids are crucial for catalytic activity and/or proper protein folding (for Asn481). His452 is located on the same loop as His450 which interacts with the carboxylate moiety of GlcA. Conformational changes of this loop will likely have an impact on the catalytic activity, suggesting that manganese coordination is directly involved in the catalytic properties of the enzyme and not only for conformational integrity. Regarding substrate binding, the entire active site cle was found to shi from a negative to a positive surface potential as the pH decreased from 7-6, similar to the pH optimum of DS-epi1 and the pH found in the Golgi lumen, which likely is a requirement for binding of the negatively charged polysaccharide substrate. 44 We also identied a tryptophan residue, Trp98, lined up in the active site, which could contribute to carbohydrate-aromatic interactions with the pyranose rings of the substrate, in agreement with previous reports on the interaction modes of enzymes acting on neutral polysaccharides. 45 We have previously shown that DS-epi1 works in a processive way, resulting in the generation of several adjacent IdoA-GalNAc disaccharides. 21 Further experiments are needed in order to investigate the specic mechanism underlying the stepwise epimerizations, also taking into consideration the in vivo micro milieu created by the interaction of DS-epi1 and sulfotransferases. 23 More than 50% of all human proteins are thought to be glycosylated 46 and aberrant glycosylation patterns are known to affect both folding and function of proteins. In contrast, only $5% of all human protein structures in the PDB database contain glycans. 47 Most post-translational protein modications introduce microheterogeneities resulting in surface variations between the molecules. In addition, the relatively long and exible glycan entities may substantially increase the entropy at the surface. The resulting loss of structurally homogeneous molecules are well known reasons for severely hampering the crystallization process or decrease the quality of crystals. However, on a practical level, glycosylations are in many cases well accommodated by water channels in the crystals or constitute parts of crystal contacts, and crystallography experiments with glycosylated proteins many times have comparable success rates to non-glycosylated proteins. 48 For DS-epi1, it was possible to achieve diffraction-quality crystals using a fully glycosylated protein. In addition, several glycans were also clearly visible in the electron density. By extending the information from the crystal structure with mass spectrometric glycopeptide analysis of wild-type HEK293 protein, we generated a more complete picture of the N-glycosylation of DS-epi1 and extended the known glycosylation pattern to include a partly N-glycosylated site on Asn411. From previous experiments, we have shown that full glycosylation of the epimerase is required for activity. 19 The high-mannose glycan on Asn336 is positioned parallel to the linker peptide connecting the a and b domains, where it may play an important role for the interaction and stability of the two domains. This is also supported by the fact that only Asn 336 and 648 are conserved in DS-epi2 (ESI Fig. 5 †) and that DS-epi1 could not be expressed in Glyco-Delete cells which can only produce N-glycan stubs of sialylated trisaccharides.
Both DS-epi1 and DS-epi2 were found to be broadly distributed in Chordata. However, in simpler organisms such as Cnidaria, Echinodermata and Hemichordata, only DS-epi2 was found, suggesting that DS-epi2 is the older of the two human DS-epimerases. The consequence of the observed difference in DS-epi expression in more primitive organisms should be investigated. As for the structure of DS-epi1, it reveals a high structural similarity to proteins from several families of bacterial polysaccharide lyases (e.g. CAZy PL8, PL12 and PL21). Since the products of epimerization (and subsequent sulfation) are substrates for several of the bacterial lyases, it is tempting to believe that the ancestral gene was of eukaryotic origin. Further, GAGs are not found in more basal metazoans (e.g. Porifera), which suggests that the enzymes are the result of a horizontal gene transfer event. 49 However, even though horizontal gene transfers have been extensively described, transfers from eukaryotes to bacteria is less common than vice versa. 50 It remains to be understood whether a gene transfer event has occurred, or if all genes have evolved from a common ancestral gene.
In summary, we present the rst structure of human N-glycosylated human dermatan sulfate epimerase 1, expressed in mammalian cells. The unique structure of the enzyme, only found in metazoans, highlights the importance of C5epimerization of uronic acid in higher organisms. The structure will be essential for generation of inhibitors, which may function as drugs for cancer and brosis. 16,51,52

Expression and purication of DS-epi1
DS-epi1 23-775 and 23-894 was cloned and expressed as previously described, with the following modications: DS-epi1 23-775 used for crystallization was expressed in the HEK293 GnTI-cell line (a kind gi from Professor Nico Callewaert, Ghent University, Belgium) which produces protein with a homogenous N-glycosylation pattern composed of high-mannose-type glycans. 21,53 Size-exclusion chromatography was performed on a Superose 200 increase 10/300 mm column (GE Healthcare Life Sciences) using a running buffer composed of HEPES (20 mM, pH 7.9), NaCl (150 mM), and MnCl 2 (2 mM). The column was operated at 0.5 ml min À1 and monomeric fractions were pooled and concentrated using 30 kDa MWCO Amicon Ultra centrifugal concentrators (Millipore). The general yield of pure protein was in the range of 1-4 mg per liter culture.

Crystallization and data collection
The puried DS-epi1 at a concentration of 7.3 mg ml À1 in buffer containing 20 mM HEPES pH 7.9, 150 mM NaCl, 2 mM MnCl 2 used for setting up drops using commercially available screens. An initial crystal hit was obtained by the sitting drop vapor diffusion method with a protein to reservoir volume ratio of 200 : 200 nl and incubated with a 45 ml reservoir at 20 C in a triple drop UV polymer plate (Molecular Dimensions, UK). A mosquito nanoliter pipetting robot (TTP Labtech, UK) was used to set up drops, which were imaged by the Minstrel HT UV imaging system (Rigaku Corporation, USA) available at the Lund Protein Production Platform (LP3), Lund University. Crystals were obtained with a reservoir containing 200 mM NH 4 CH 3 CO 2 , 100 mM MES pH 6.5 and glycerol ethoxylate (15/4 EO/OH) 30% v/v (condition #G7 of the molecular dimensions MIDAS screen). The crystals were then further optimized by hanging drop using a Nextal plate and diffraction quality crystals were obtained within 1 week from a crystallization solution with 6% xylitol as an additive. Crystals were picked up directly from the drop where glycerol ethoxylate present in the crystallization solution worked as cryoprotectant. Anomalous data were collected at 100 K using the ID29 beamline at the ESRF, 54 Grenoble, France and native data collected at BioMAX beamline 73

Structure determination and model building
The diffraction images were integrated using XDS 55 and scaled using Aimless 56 from the CCP4 package. 57 The structure was solved by anomalous phasing using the Crank2 package, 58 where 6 anomalous sites from manganese ions were found by SHELX 59 and substructure was determined by SHELXD. 60 Density modication was done by Solomon, 61 Multicomb 62 and Parrot. 63 Model extension by Buccaneer 64 and renement by Refmac5. 65 The Crank2 package could build a partial model with 604 residues with a nal R/R-free of 42.71%/46.66%. The resulted model was then used in Phaser 66 to calculate a density map with a higher resolution data set with a TFZ score of 33.6 and LLG of 1217. The resulting map was corrected and extended using Phenix Autobuild wizard. 67 This obtained structure was rebuilt and corrected manually in repeated cycles of Coot 68 and Buccaneer, then rened to convergence using phenix.rene. 69 The quality of the rened structure of glycans were checked with Privateer. 70 Data collection and renement statistics are found in Table 1.

Molecular dynamics simulations
Molecular dynamics simulations were performed with the OPLS3 force eld in Desmond implemented in Schrödinger Release 2020-1 using default settings except for the length of the simulation and the use of light harmonic constraints (1 kcal mol À1ÅÀ2 ) on all stranded and helix backbone atoms.

DFT calculations
QM calculations were performed with Jaguar implemented in Schrödinger Release 2020-1. Gas phase geometries were optimized at the M06-2X/6-31g** level of theory with D3 a posteriori corrected dispersion. The PBF solvation model with water solvent was then applied through single point energy calculations at the same level of theory.

Construction of plasmids for expression of DS-epi1 point mutants
Point mutations were introduced into the pCEP-Pu-DS-epi1 23-775 plasmid by PCR amplication using a Platinum SuperFi II polymerase (Thermo Fisher Scientic) and the primers in Table  2.
PCR-amplied products were phosphorylated with T4 PNK (NEB), ligated using Quick T4 DNA ligase (NEB) and then used to transform DH5-alpha competent E. coli (Thermo Fisher Scientic). Plasmids were puried using PureLink fast lowendotoxin midi plasmid purication kit (Thermo Fisher Scien-tic) and nally sequenced (Eurons Genomics).

Expression of DS-epi1 point mutants
Expi293F suspension cells (Thermo Fisher Scientic) were cultured in Expi293 expression medium (Thermo Fisher Scien-tic) at 37 C in an 8% CO 2 incubator, with shaking at 130 rpm (19 mm orbit). Cells, at a density of 3 Â 10 6 cells per ml (>95% viability), were transfected with mutant plasmids using 1 mg DNA per ml culture and 3 ml polyethyleneimine "MAX 40K" (Polysciences, Inc.) per mg DNA. Aer 20 h, sodium valproate and anti-clumping supplement (Irvine Scientic, 91150) were added to 3 mM and 2 ml l À1 , respectively. The cell culture supernatant was harvested aer 72 hours and claried by centrifugation (2000 g for 10 min) and ltration (0.45 mm PES lter unit, Thermo Fisher Scientic).

Western blot analysis
Conditioned culture medium was mixed with Laemmli buffer and proteins were separated on a 4-15% Mini-PROTEAN TGX stain-free protein gel (Bio-Rad) using a tris-glycine-SDS buffer (Bio-Rad). Proteins were then transferred to PVDF membranes, which were subsequently blocked (EveryBlot, Bio-Rad) and then probed with a home-made rabbit anti-DS-epi1 (1 mg ml À1 , antigen: amino acids 509-520) in blocking buffer (EveryBlot, Bio-Rad). A goat anti-rabbit IgG (H + L)-HRP conjugate (Bio-Rad, 1706515) was used together with a clarity western ECL substrate (Bio-Rad) to develop the blot, which was imaged using a ChemiDoc imaging system (Bio-Rad).

Epimerase activity analysis
Epimerase activity was measured as previously described, with slight modications. 5 Conditioned culture medium was dialyzed against a buffer consisting of 20 mM MES (pH 5.5 at 37 C), 10% glycerol, 0.5 mM EDTA, 0.1% Triton X-100 and 1 mM dithiothreitol. Culture medium from cells transfected with an empty vector was included as negative control and medium from cells transfected with a plasmid encoding wild-type DS-epi1 23-775 was used as a positive control. The desalted medium samples (80 ml each) were mixed with a substrate cocktail (20 ml) containing 0.1 mg of bovine serum albumin, 2 mM MnCl 2 , 0.5% Nonidet P-40, and 30 000 dpm [5-3 H]chondroitin. Aer 18 h at 37 C, the sample was boiled and subsequently centrifuged at 20 000 g for 5 min. The supernatant was distilled and tritium release was quantied using a Hidex 600SL automatic TDCR liquid scintillation counter.

Size-exclusion chromatography and molecular weight determination
To compare the mono-or multimeric state of DS-epi1 at different protein concentrations, proteins were separated on an AdvanceBio SEC UHPLC column (300Å, 2.7 mm, 4.6 Â 300 mm, Agilent) with a mobile phase consisting of HEPES (20 mM), NaCl (150 mM) and MnCl 2 (5 mM).
For absolute molecular weight determination, multidetection SEC was performed using a Malvern Panalytical OMNISEC system (Malvern, UK) consisting of Refractive Index (RI), right angle and low angle light scattering (RALS/LALS) and differential viscometer. For chromatographic separation, a Superdex 200 increase 10/300 (GE Life Sciences) was used with a mobile phase consisting of HEPES (20 mM) and NaCl (150 mM). A dn/dc of 0.185 ml g À1 was used to process the protein samples. All data was collected and processed using OMNISEC v10.

Protein digestion of glycopeptides for MS analysis
Full-length DS-epi1 (amino acids 23-894, without signal peptide and transmembrane regions) with native glycosylation was dissolved in 50 mM triethylammonium bicarbonate (TEAB) to give protein concentrations of 1 mg ml À1 . Based on protein sequence evaluation, chymotrypsin was selected as proteolytic enzyme providing the best access to all potential N-glycosylation sites. Protein samples (20 mg) were reduced with 4.1 mM dithiothreitol (DTT) at 60 C for 30 min and alkylated with 8.3 mM 2-iodoacetamide (IAM) for 30 min at room temperature (in the dark), following additional 15 min incubation with 4.1 mM DTT to react with excess of IAM prior to the chymotrypsin digest. Chymotryptic digest was performed overnight at 37 C by addition of Pierce™ MS grade chymotrypsin (0.25 mg, Thermo Fisher Scientic). Digestion was stopped by acidication with 10% triuoroacetic acid and samples were desalted using Pierce™ peptide desalting spin columns (Thermo Fischer Scientic) following the manufacturer's guidelines. The salt free supernatants were dried down and reconstituted in 2% acetonitrile (ACN) in 0.1% formic acid (FA) for LC-MS analysis.

NanoLC MS analysis of glycopeptides
Digested samples were analyzed on a Q Exactive HF mass spectrometer interfaced with Easy-nLC1200 liquid chromatography system (Thermo Fisher Scientic). Peptides were trapped on an Acclaim Pepmap 100 C18 trap column (100 mm Â 2 cm, particle size 5 mm, Thermo Fischer Scientic) and separated on an in-house packed analytical column (75 mm Â 300 mm, particle size 3 mm, Reprosil-Pur C18, Dr Maisch) using a gradient from 7% to 50% B over 75 min, followed by an increase to 100% B for 5 min at a ow of 300 nl min À1 , where solvent A was 0.2% FA and solvent B was 80% ACN in 0.2% FA. The instrument operated in data-dependent mode where the precursor ion mass spectra were acquired at a resolution of 120 000, m/z range 600-2000. The 10 most intense ions with charge states 2 to 5 were selected for fragmentation using HCD at collision energy settings of 28. The isolation window was set to 3 m/z and dynamic exclusion to 20 s. MS2 spectra were recorded at a resolution of 30 000 with maximum injection time set to 110 ms. Chymotryptic peptides with up to 6 missed cleavage were accepted. The detected peptide threshold in the soware was set to a signicance level of Mascot 95% by searching against a reversed database and identied proteins were grouped by sharing the same sequences to minimize redundancy. This analysis conrmed the successful isolation and sufficient proteolytic digestion of the target protein, dermatan-sulfate epimerase 1 (DS-epi1). The Byonic soware (Protein Metrics) was used to identify glycopeptides from two different DS-epi1 preparations. The precursor and fragment ion tolerance were set to 10 ppm and 20 ppm, respectively. Fully specic cleavage aer FWYL with up to 5 missed cleavages were accepted. The "309 mammalian Nglycans" database generated by the Consortium for Functional Glycomics (https://www.functionalglycomics.org) together with Met oxidation were allowed as variable modications. Cysteine carbamidomethylation was set as static modication. Byonic's glycopeptide identications were manually evaluated prior to the nal assignment of the observed glycosylation forms for each glycopeptide. The extracted ion chromatograms (EIC) of the identied glycopeptides were used to determine site-specic glycoform distributions (microheterogeneity). For each observed glycoform, an average intensity was calculated from the EIC peak intensities of three independent injections. For each N-glycosylation site, all glycosylated and non-glycosylated peptides sharing the same chymotryptic peptide sequence were used for evaluation of the site microheterogeneity. The relative abundance of each glycoform was calculated as percent of the summed intensities of all detected glycoforms for the given glycosylation site.

Sample preparation for targeted cross-linking mass spectrometry (TX-MS)
DS-epi1 23-894 (full-length, minus transmembrane domains) was used as input material for the TX-MS experiments. Six micrograms of DS-epi1 was resuspended in HEPES buffer (20 mM, pH 7.0), supplemented with MnCl 2 (15 mM), at 37 C, 800 rpm for 30 min. Heavy/light disuccinimidyl suberate (DSS-H12/D12, Creative Molecules Inc.) resuspended in dimethyl sulfoxide (DMSO) was added to nal concentrations of 0.5 mM and incubated for an additional 30 min at 37 C, 800 rpm. The reaction was quenched with a nal concentration of 50 mM Tris (pH 7.5) at 37 C, 800 rpm, for 15 min. Cross-linked samples were denatured in urea (6 M in 100 mM NH 4 HCO 3 ) and reduced with a nal concentration of 6.7 mM tris(2-carboxyethyl) phosphine at 37 C, 800 rpm, for 45 min. Reduced cysteines were alkylated with iodoacetamide (nal concentration 6.7 mM) at 22 C for 30 min. Samples were diluted to 1.5 M urea in 100 mM NH 4 HCO 3 and sequencing-grade lysyl endopeptidase (1.25 mg, 37 C, 2 h) (Wako Chemicals) followed by trypsin (2 mg, 37 C, 18 h) (Promega) was added to digest the protein into peptides. The peptide-containing samples were acidied with formic acid to pH 3.0 and the peptides were puried by C18 reversed-phase columns according to the manufacturers protocol (Macro SpinColumns Silica C18, Harvard Apparatus). Puried peptides were dried and reconstituted in 2% acetonitrile supplemented with 0.2% formic acid.

LC-MS/MS of TX-MS samples
All peptide analyses were performed on a Q Exactive Plus mass spectrometer (Thermo Scientic) connected to an EASY-nLC 1000 ultra-high-performance liquid chromatography system (Thermo Scientic) essentially as described 20 with a few modi-cations. For data dependent analysis (DDA), peptides were separated on an EASY-Spray column (Thermo Scientic; ID 75 mm Â 50 cm, column temperature 45 C) operated at a constant pressure of 600 bar. A linear gradient from 3% to 35% acetonitrile in aqueous 0.1% formic acid was run for 60 min at a ow rate of 300 nl min À1 . One full MS scan (resolution 70 000 @ 200 m/z; mass range 400-1600 m/z) was followed by MS/MS scans (resolution 17 500 @ 200 m/z) of the 15 most abundant ion signals. The precursor ions were isolated with 2 m/z isolation width and fragmented using higher-energy collisional-induced dissociation at a normalized collision energy of 30. Charge state screening was enabled, and precursors with an unknown charge state and singly charged ions were rejected. The dynamic exclusion window was set to 10 s. The automatic gain control was set to 1 Â 10 6 for both MS and MS/MS with ion accumulation times of 100 and 60 ms, respectively. The intensity threshold for precursor ion selection was set to 1.7 Â 10 4 . For high-resolution MS1 (hrMS1), peptides were separated using an EASY-Spray column (Thermo Scientic; ID 75 mm Â 50 cm, column temperature 45 C) operated at a constant pressure of 600 bar as in DDA and DIA. A linear gradient from 3% to 35% acetonitrile in aqueous 0.1% formic acid was run for 60 min at a ow rate of 300 nl min À1 . High-resolution MS scans (resolution 280 000; mass range 400-2000 m/z) were acquired using automatic gain control (AGC) set to 3 Â 10 6 and a ll time of 200 ms.
Structural modeling of the DS-epi1 C-terminus using the TX-MS protocol To provide the structural inputs for the modeling, three separate steps were taken each of which was guided by constraints derived from cross-linking MS. First, Rosetta de novo modeling protocol was used to model the C-terminus of DS-epi1 by producing 3000 models. Next, the small transmembrane domain was modeled using Rosetta comparative modeling protocol (RosettaCM) by producing 200 models. For the nal structure, 3000 models were provided using RosettaCM protocol by incorporating the top models from the previous steps. All structural models were ltered by cross-linking data as the experimental constraints using TX-MS approach, yielding a nal 226 models (ESI Table 4 †). 20 Finally, two different scoring systems have been combined to select the nal model. First, the Rosetta energy function is used to select the top 20 models and then, a normal distribution score based on cross-link length is considered in a way, cross-links with length between 15Å and 25Å obtained better score in comparison to shorter or longer cross-links with respect to the main threshold of 32Å.

Dali protein structure comparisons
All Rosetta models were compared to the complete PDB (155625 models, downloaded on 2/10-19) using a cluster installation of DaliLite.v5. 36 The highest scoring hit for each model is summarized in ESI Table 5. †

In silico exploration of species expressing DS-epimerases
A manually curated protein BLAST search was performed against the non-redundant protein databases (GenPept, Refseq, PDB, SwissProt, PIR and PRF). For DS-epi1, part of the DS-epi1-specic human C-terminal domain, ranging from amino acid 755-898, was used as query sequence. For DS-epi2, amino acids 731-1010, part of the predicted human DS-epi2-specic O-sulfotransferase domain, was used as query. All predicted splice forms were ignored and only one corresponding ortholog for DS-epi1 and -2 was selected for each species.
Alignments for phylogenetic analyses were prepared using Clustal Omega. 71 A circular phylogenetic tree, rooted in Homo sapiens, was constructed using Interactive Tree Of Life (iTOL) version 4. 72 Taxonomy IDs were assigned using Batch Entrez (https://www.ncbi.nlm.nih.gov/sites/batchentrez) and leaf labels were automatically added based on taxonomy ID.

Graphical representations
All graphical representations were prepared in UCSF Chimera version 1.13.1 (build 41965), unless otherwise stated.

Data availability
Coordinates and structure factors for DS-epi1 have been deposited in the protein data bank under accession code 6HZN. Other data are available from the corresponding author upon request.

Author contributions
ET, AM, UE, GWT and UM designed and coordinated the study. ET, MH, HK, LH, AS and JU performed the experiments and analyzed the data. ET, MH, HK, LH, JU, JM, LM, UE and AM interpreted and reviewed the results. ET, MH, HK, UE and AM wrote the article, which was reviewed and approved by all of the authors.

Conflicts of interest
The authors declare no conicts of interest.