Characterization of Cd36_03230p, a putative vanillin dehydrogenase from Candida dubliniensis †

A coding sequence ( CD36-03230 ) from the yeast Candida dubliniensis had been previously annotated as a vanillin dehydrogenase (VDH). The corresponding protein (CD36-03230p) was recombinantly expressed in Escherichia coli and analysed. The protein is most likely a tetramer in solution as judged by crosslinking and gel ﬁ ltration experiments. CD36-03230p is an active aldehyde dehydrogenase favouring cyclic and aromaticsubstrates. Positivecooperativity and substrate inhibition were observed with some substrates. The redox cofactor NADP + and substrates a ﬀ ected the thermal stability of the protein. Interestingly, the enzyme had no detectable activity with vanillin suggesting that the annotation is incorrect. It has been previously hypothesized that a methionine residue at a key position in the active site of yeast aldehyde dehydrogenases sterically hinders cyclic substrates and restricts speci ﬁ city to aliphatic aldehydes. Molecular modeling of CD36-03230p demonstrates that it has an isoleucine residue (Ile-156) at this position, further strengthening this hypothesis.


Introduction
Most mammalian aldehyde dehydrogenases (ALDHs) have a broad specicity for aliphatic aldehydes, as well as some aromatic and polycyclic aldehydes, 1-3 thus rendering an important protective enzymatic function against these xenobiotics. Eubacterial ALDHs, on the other hand, exhibit relatively narrow substrate specicity depending on their natural habitat and exposure to endogenous and exogenous aldehyde reactive elements. Hence, characterization of ALDHs capable of catalyzing the oxidation of aromatic aldehydes has been well documented in bacteria. [4][5][6][7][8][9][10][11][12][13] Vanillin dehydrogenase (VDH), a sub-class of benzaldehyde dehydrogenases, 3 is a critical enzyme for the degradation of lignin derived phenylpropanoids (such as vanillin, vanillate, caffeate, pcoumarate, cinnamate and benzaldehyde). These aromatic aldehydes, especially vanillin, are abundant as avour and aromas in food and cosmetic industries. It is important to elucidate the structural and functional characteristics of these enzymes given their potential role in food chemistry and biotechnology. This group of enzymes has been characterized in a number of eubacterial species, including Pseudomonas uorescens, 6 Pseudomonas putida KT2440, 13 Corynebacterium glutamicum, 7 Rhodococcus jostii RHA1, 4 Amycolatopsis sp. strain ATCC 39116, 8 Sphingomonas paucimobilis SYK, 9 and Micrococcus sp. 10 Mammalian epithelial ALDH1A and salivary ALDH3A1 typically show activity towards a wide range of aromatic aldehyde substrates (including vanillin, benzaldehyde, cinnamaldehyde, 4-hydroxynonenal). 14 Plant ALDH2 family members have also been observed to show a broad aromatic substrate specicity range, but with no report of activity with vanillin. 15 Two ALDHs from white rot fungus Phanerochaete chrysosporium which are translationally up-regulated with exogenous addition of vanillin are active as VDHs. 16 There are reports of recombinant vanillin production in metabolically engineered baker's yeast harbouring heterologous genes, 17 but no studies are available focusing on the characterization of endogenous vanillin dehydrogenase enzymes in yeast species. To date, a single protein sequence CD36-03230p (accession number: XP_002416995) from Candida dubliniensis genome has been provisionally annotated as a putative vanillin dehydrogenase (Cd36_03230p). The purpose of this study was to characterise the substrate specicity and oligomeric structure of recombinant Cd36_03230p, in order to validate (or otherwise) its putative role. The results were interpreted, in part, based on results from our previous study which described two ALDHs from Saccharomyces cerevisiae var. boulardii. 18

Recombinant expression and purication of Cd36_03230p
The coding sequence for Cd36_03230 (based on accession number: XM_002416950) was synthesised following optimization of the sequence for expression in Escherichia coli (GenScript NJ, USA). The coding sequence was PCR-amplied and amplicons were inserted into the E. coli expression vector pET46 Ek/LIC (Merck-Millipore, Nottingham, UK) according to the manufacturer's instructions (note that this vector introduces bases coding for the amino acid sequence MAHHHHHHVDDDDK at the 5 0 end of the coding sequence). Correct insertion into the vector was veried by PCR and by DNA sequencing (GATC, London, UK) of the insert.
The expression vector was used to transform competent E. coli Rosetta™ (DE3) cells (Merck-Millipore) and colonies resulting from this transformation were used to inoculate cultures (5 ml of Luria Bertani medium (LB) supplemented with 100 mg ml À1 ampicillin and 34 mg ml À1 chloramphenicol) which were grown at 37 C overnight (17-18 h) with orbital shaking. Each culture was then diluted into 1 l of LB (supplemented with 100 mg ml À1 ampicillin and 34 mg ml À1 chloramphenicol), grown (with orbital shaking) until A 600 reached 0.6 to 1.0 (typically 5-6 h) at 30 C, followed by a slow induction by adding 1.3 mM IPTG overnight (12-16 h) at 16 C. These induction conditions were based on our previous experience of working with a wide variety of recombinant proteins. Cells were harvested by centrifugation (4200g for 15 min), resuspended in cell resuspension buffer (50 mM HEPES-OH, pH 7$5, 150 mM NaCl, 10% (v/v) glycerol) and stored frozen at À80 C until the purication step.
For purication, cell suspensions were thawed, disrupted by sonication on ice (three pulses at 100 W for 30 s with 30 s gaps for cooling) and claried by centrifugation (20 000g, 20 min, 4 C). The supernatant was applied to a cobalt agarose column (1 ml, His-Select, Sigma, Poole, UK) which had been preequilibrated in buffer A (cell resuspension buffer, except 500 mM NaCl) and allowed to pass through by gravity. The column was washed with 40 ml of buffer A and the protein eluted with three 2 ml aliquots of buffer C (buffer A plus 250 mM imidazole). Protein containing fractions were identi-ed by SDS-PAGE and dialysed overnight at 4 C against cell resuspension buffer supplemented with 1 mM DTT. The concentration of Cd36_03230p was determined by the method of Bradford 19 using BSA as a standard. The puried fractions were frozen at À80 C in 20 ml aliquots.

Bioinformatics and modeling
Multiple sequence (structure-based) alignments were carried out for Cd36_03230p with known structures of aromatic aldehyde dehydrogenases (class 3) such as benzaldehyde dehydrogenase from Pseudomonas putida (PDB ID 3LV1), Corynebacterium glutamicum (PDB ID 3R64) and a salicylaldehyde dehydrogenase from Pseudomonas putida G7 (PDB ID 4JZ6) using T-Coffee algorithm in Expresso template mode available at http:// www.tcoffee.org. [20][21][22] Human retinal dehydrogenase 1 (PDB ID 4WB9) 23 and human liver mitochondrial dehydrogenase (PDB ID 1CW3) 24 were also incorporated into the alignment as members of class 1 and class 2 ALDHs respectively to have an insight into their relatedness with Cd36_03230p if any. The sequence homology was evaluated using ESPript 3.0 available at http:// espript.ibcp.fr. 25 The phylogenetic tree was constructed using ClustalW Phylogeny (version 2.1), a web-based service available at http://www.ebi.ac.uk/Tools/msa/clustalw2/ 26 by neighbourjoining method. 27 An initial molecular model of the protein was generated using Phyre2 (ref. 28) and energy minimized using YASARA. 29 A model of dimeric Cd36_03230p was generated by aligning two copies of the model to the ALDH domains of Geobacter sulfurreducens PutA (PDB 4NMB 30 ) and saving the two monomers into a single protein structure (pdb) le. This initial, dimeric model was then subjected to a second round of energy minimization using YASARA. A tetrameric model was generated in the same way using the tetrameric structure of sheep liver class 1 aldehyde dehydrogenase (1BXS 31 ) as the template. These models are available as ESI † to this paper.

Cross-linking
Crosslinking with bis(sulfosuccinimidyl)suberate (BS 3 ; 50-800 mM) was carried out with 18 mM protein (diluted as required in 100 mM sodium phosphate buffer pH 7.4) in a total volume of 10 ml. Reaction mixtures were incubated at 30 C for 30 min before addition of the crosslinker and then incubated at the same temperature for a further 35 min. Reactions were stopped by addition of an equal volume of SDS-loading buffer (120 mM Tris-HCl, pH 6.8, 4% (w/v) SDS, 20% (v/v) glycerol, 5% (w/v) bromophenol blue, 1% (w/v) DTT) and analysed by 10% SDS-PAGE.

Analytical gel ltration
Cd36_03230p (200 ml of a 60 mM puried protein aliquot) was chromatographed on a Sephacryl S-300 (Pharmacia) column (total volume, V t ¼ 65.2 ml; void volume, V 0 ¼ 15.1 ml) at a ow rate of 1 ml min À1 . The column was equilibrated and developed in buffer G (50 mM Tris-HCl, 17 mM Tris base, 150 mM sodium chloride, pH 7.4). [32][33][34] Fractions (1 ml) were collected and analysed for protein content by measuring the absorbance at 280 nm. Standard proteins (thyroglobulin, 669 kDa; albumin, 67 kDa and chymotrypsinogen, 25 kDa) were used to calibrate the column. Their elution volumes (V e ) were used to calculate K av according to the equation: The Stoke's radius (R s ) was estimated from the inverse correlation of this parameter with K av and the sedimentation coefficient (S 20,w ) was estimated from the molecular models using WinHydroPRO 1.00. 35 The sub-unit stoichiometry (n) was then estimated using the equation: where M is the molecular mass of a monomer (52 100 Da), N A is Avogadro's number (6.023 Â 10 23 mol À1 ), h is the viscosity of the solvent (0.01 g cm À1 s À1 ), v 2 is the partial specic volume (0.73 g cm 3 g À1 ) 36 and r is the density of the solvent (1.0 g cm À3 ). Values obtained with both the dimeric and tetrameric models of Cd36_03230p were compared in order to see which t better to the experimental data.

ALDH activity measurements and enzyme kinetic analysis
The enzyme assay was performed as described previously. 7,37-39 Aldehyde dehydrogenase enzyme activity was monitored at 30 C using a ThermoScientic Multiskan™ Microplate spectrophotometer. The reactions contained 100 mM potassium phosphate buffer (pH 7.3), 0.5 mM NADP + , varied concentrations of substrates (10-1200 mM) and 0.6 mM enzyme. The longchain (C 8 -C 13 ) and phenolic aldehydes were dissolved in DMSO (1.7% (v/v), nal concentration) as a solvent carrier.
Steady-state kinetic data was obtained in triplicates in 96well plates with readings taken every 5 s. The initial, linear portion of the progress curve was identied by visual inspection and tted to linear regression to give the initial rates (v) of change in absorbance at 340 nm. These rates were converted to molar units using the extinction coefficient of NADPH (6.22 mM À1 cm À1 ) 40 to give rates of reactions in micromolar concentration of NADH formed per second.
The kinetic parameters (k cat , K 0.5 and Hill coefficient, h) were obtained by plotting the rates of reaction against substrate concentration and tting the data to the equation below using non-linear regression as implemented in GraphPad Prism 6.0 (GraphPad Soware Inc, CA). All points were weighted equally. v where k cat is the turnover number, [E] is the enzyme concentration, [S] is the concentration of substrate, K 0.5 is the concentration of substrate that produces a half-maximal enzyme velocity (analogous to the Michaelis constant, K m , in non-cooperative enzymes) and h is the Hill coefficient. 41,42 Differential scanning uorimetry (DSF) Enzyme aliquots were diluted in 50 mM HEPES, pH 7.3, to a concentration of 5-7 mM to a nal volume of 20 ml. Sypro Orange (10Â; manufacturer's concentration denition) was as previously described. 43,44 Cofactor (NADP + ) and substrates were added as appropriate. Where required, substrates were initially dissolved in 100% DMSO and diluted in buffer R (50 mM HEPES-OH, pH 7.5, 150 mM NaCl, 10% v/v glycerol) as required. The concentration of DMSO never exceeded 1% (v/v).

Results and discussion
Cd36_03230p is an unusual aldehyde dehydrogenase-like protein Structure-based multiple sequence alignment showed conserved residues at both NAD(P) + binding and catalytic domains. However, the Cd36_03230p sequence has gaps and substitutions at otherwise conserved residues when aligned to class 1, class 2 and class 3 ALDHs which suggests structural and functional disparity (ESI Fig. S1a †). It also forms an independent cluster on the phylogenetic tree and exhibits clear evolutionary distance with the already known structures of class 3 benzaldehyde and salicylaldehyde dehydrogenases with veried VDH activities, 7,13 besides the class 1 and 2 ALDHs (non-VDH) (ESI Fig. S1b †). Therefore, it is evident that Cd36_03230p sequence is not sufficiently similar to the salicylaldehyde dehydrogenases or benzaldehyde dehydrogenases to be classied as the member of the class 3 vanillin dehydrogenase family. 45 However, molecular modelling of Cd36_03230p predicted that it has a similar fold to other aldehyde dehydrogenases (Fig. 1a). The highest ranked template used in Phyre2 was PutA from G. sulfurreducens PutA (PDB 4NMB 30 ) with a root mean squared deviation (rmsd) of 1.0Å over 1922 equivalent atoms. The fold is also predicted to be similar to mammalian aldehyde dehydrogenases (for example, the rmsd when compared to sheep liver aldehyde dehydrogenase is 0.7Å over 1957 equivalent atoms). The predicted structure is largely a-helical with a protruding bsheet region at the C-terminus which is, by comparison to oligomeric aldehyde dehydrogenase structures, likely to be involved in homo-oligomer assembly. Since there are both dimeric and tetrameric aldehyde dehydrogenases known, we built both oligomeric versions in order to assist with the interpretation of gel ltration experiments (see below). Comparison with the structure of sheep liver aldehyde dehydrogenase (which was solved with NAD + bound) enabled the prediction of the cofactor binding site. This lies near the surface of each subunit, adjacent to a cle which is the likely aldehyde substrate binding site (Fig. 1b). Previously, we have suggested that a bulky amino acid residue (Met-177 in S. cerevisiae var. boulardii Ald6p) is partly responsible for restricting access to bulkier, cyclic aldehydes. 18 In Cd36_03230p, the structurally equivalent residue is Ile-156. Therefore, we hypothesized that this smaller residue may enable Cd36_03230p to accommodate cyclic aldehydes.

Expression, purication and oligomeric structure of Cd36_03230p
Cd36_03230p could be expressed in, and puried from, E. coli Rosetta™ (DE3) cells. Typical yields were approximately 1.5 mg l À1 of bacterial cell culture (Fig. 2a). Unlike other yeast ALDHs, 18 this protein did not show multiple bands resulting from oligomerisation on 10% SDS-PAGE suggesting that any oligomeric form(s) are less resistant to heat and SDS denaturation.
The enzyme was able to form dimers and tetramers as demonstrated by chemical crosslinking with BS 3 . Resolution of the crosslinked products by 10% SDS-PAGE revealed bands corresponding primarily to a homotetramer ($210 kDa), with some higher order oligomers (Fig. 2b). The intensity of these bands was greater following treatment with increasing concentrations of BS 3 . Dimeric ALDHs have previously been reported in, for example, human ALDH3 due to an extended C-terminal tail which prevents tetramerisation. 46 However, our predicted structure of Cd36_03230p suggests that there is no such tail in this protein. Gel ltration chromatography was used to estimate the native molecular mass in solution and, thus the subunit stoichiometry. Sedimentation coefficients of models of the dimeric and tetrameric models of Cd36_03230p were computationally estimated as 6.3 and 10.0 S respectively in order to allow for effects due to the shape of the protein. The Stoke's radius was estimated from the gel ltration data as 5.0 nm. This yielded an estimated subunit composition of using the dimeric model of 2.5 and 4.0 using the tetrameric model. The tetrameric model is clearly a better t to the data, suggesting the Cd36_03230p exists predominantly as a tetramer in solution. However, given that higher molecular mass species were detected by crosslinking, higher oligomeric forms may also be present.

Cd36_03230p exhibits unusual kinetic patterns with cyclic and aromatic substrates
Cd36_03230p demonstrated activity with cyclohexanecarboxyaldehyde (a cyclic aliphatic aldehyde), benzaldehyde and 4-hydroxybenzaldehyde (an aromatic aldehyde), showing highest activity towards 4-hydroxy benzaldehyde judged by its k cat to K 0.5 ratio (Table 1 and Fig. 3a). Interestingly, Cd36_03230p kinetics also showed substantial substrate inhibition by 4hydroxy benzaldehyde at a substrate concentration higher than $10 mM. Despite attempts to t these data to various kinetic models, it was not possible to obtain a good t with corresponding estimates of kinetic constants for the data with this substrate. Previously, substrate inhibition of benzaldehyde dehydrogenases from Acinetobacter calcoaceticus by benzaldehyde and betaine aldehyde dehydrogenase from Staphylococcus aureus by betaine aldehyde has been reported. 47,48 Apart from this, Cd36_03230p failed to show any activity towards aliphatic (short-chain and long-chain) and most aromatic aldehydes used in this study. These included acetaldehyde, propionaldehyde, butyraldehyde, isobutyraldehyde, valeraldehyde, hexanaldehyde, heptanaldehyde, octanaldehyde, nonanaldehyde, decyl aldehyde, undecyl aldehyde, dodecyl aldehyde, tridecyl aldehyde, crotonaldehyde, DL-glyceraldehyde and, notably, 4-hydroxy-3-methoxy benzaldehyde (vanillin).

Substrates and cofactors increases thermal stability of Cd36_03230p
Addition of NADP + (1.5 mM) resulted in a signicant increase in the "melting temperature", T m , of Cd36_03230p as estimated by DSF ( Table 2 and Fig. 3b). This suggests that this compound binds to, and stabilizes the protein. Long-chain aliphatic and aromatic aldehyde substrates (2 mM concentration) generally  reduced the thermal stability of the enzyme-NADP + complex by $4 C ( Table 2). This may indicate a slightly lower stability of the ternary enzyme-NADP + -aldehyde complex, perhaps resulting from increased overall exibility in the protein. However, given that many of the compounds which affect the thermal stability are not substrates of the enzyme (and may, therefore, not interact with the protein), it is also possible that they cause a small destabilization through their general hydrophobic or chaotropic properties rather than through interaction at a specic site. 49

Conclusions
This study demonstrated that, despite being annotated as such, this enzyme has no detectable vanillin dehydrogenase activity. The data do show that the enzyme functions as an aldehyde dehydrogenase, with a strong preference for some cyclic and aromatic substrates. Our previous work suggested that a key residue in the active site inuences the substrate specicity of yeast aldehyde dehydrogenases. In the case of S. cerevisiae Ald4p and Ald6p, the former has some activity to cyclic aldehyde substrates, but the latter does not. 18,50,51 Ald6p has a bulky methionine residue (Met-177) which we hypothesized might sterically hinder the binding of cyclic substrates. In contrast, the structurally equivalent residue in Ald4p is Leu-196 and alteration of Met-177 in Ald6p to valine conferred some activity with cyclic aldehydes on this enzyme. 18 In Cd36_03230p, the structurally equivalent residue is Ile-156. This provides further support for our hypothesis that a smaller hydrophobic residue at this position facilitates the binding of cyclic substrates. However, the lack of activity of Cd36_03230p with some cyclic substrates and all aliphatic substrates tested (in contrast with S. cerevisiae Ald4p and Ald6p which both act on aliphatic aldehydes) suggests that there are additional determinants of substrate specicity in these enzymes. Further studies are required to elucidate these. The lack of activity with vanillin suggests that Cd36_03230p's putative annotation as a vanillin dehydrogenase is incorrect and should be changed. We suggest that cyclic/aromatic aldehyde dehydrogenase would be a more appropriate annotation.