Damien
Baud‡
a,
Jack W. E.
Jeffries‡
b,
Thomas S.
Moody
c,
John M.
Ward
*b and
Helen C.
Hailes
*a
aDepartment of Chemistry, UCL, 20 Gordon Street, London, WC1H 0AJ, UK. E-mail: h.c.hailes@ucl.ac.uk
bDepartment of Biochemical Engineering, UCL, Bernard Katz Building, London, WC1E 0AH, UK. E-mail: j.ward@ucl.ac.uk
cAlmac, Department of Biocatalysis & Isotope Chemistry, 20 Seagoe Industrial Estate, Craigavon, BT63 5QD, N. Ireland, UK
First published on 11th January 2017
Transaminase enzymes have significant potential for the sustainable synthesis of amines using mild aqueous reaction conditions. Here a metagenomics mining strategy has been used for new transaminase enzyme discovery. Starting from oral cavity microbiome samples, DNA sequencing and bioinformatics analyses were performed. Subsequent in silico mining of a library of contiguous reads built from the sequencing data identified 11 putative Class III transaminases which were cloned and overexpressed. Several screening protocols were used and three enzymes selected of interest due to activities towards substrates covering a wide structural diversity. Transamination of functionalized cinnamaldehydes was then investigated for the production of valuable amine building blocks.
Currently TAms have significant potential for the synthesis of an increasing number of amine products. Although a number of TAms have been reported and are now available commercially there is a need to identify new TAms with variation in the primary sequence for future applications including enzyme engineering projects: a unique starting point has the potential to ensure more diverse enzymes are available. In addition new TAms not encumbered by the current patent space would be very valuable. Here our approach has been to use a metagenomics mining strategy for new TAm discovery.
Metagenomics is the concept of processing and analysing DNA extracted from an environment as if it were a single large genome.6 A frequently quoted statistic suggests that on average only 0.1–1% of bacteria are culturable from any given niche, depending on the complexity and the understanding of said niche.7,8 Attempts have been made to address this shortfall between cultivatable bacteria and the diversity known to be present. Extracted DNA can be fragmented and ligated into large Bacterial Artificial Chromosome (BAC) clone libraries. Once the environmental DNA is in a suitable host such as E. coli the BACs can be screened for desired activity using plate screens or amplified to a level compatible with Sanger sequencing methods of analysis. Such BAC and Fosmid libraries are currently used to interrogate metagenomic samples to discover new enzyme activities or new examples of enzymes with existing activities, although this is a strategy that requires suitable high-throughput screens.9,10
With the advent of high throughput sequencing technologies, raw DNA extracted from the environment can be used for sequence based analysis without the need for amplification in BAC libraries. Most of the research using such sequence based analysis has been aimed towards using microbial 16S based taxonomy to understand the range and diversity in a microbiome. More recently it has been used to observe how the microbiome changes with different inputs or host alterations, interactions between organisms and between organisms and the host, with protein annotations being used primarily to infer nutrient use and nutrient flow through the niche.11–13
More recently it has been used to observe how the microbiome changes with different inputs or host alterations, interactions between organisms and between organisms and the host, with protein annotations being used primarily to infer nutrient use and nutrient flow through the niche.11–13
Sequence data is used less frequently as an aid to enzyme discovery, most often by providing consensus sequences for primer design.14,15 The strategy applied here uses a recently reported sequence-directed enzyme retrieval method involving the identification of individual open reading frames (ORFs) with subsequent specific primer design to each non redundant example of a desired enzymes class.16 From amino acid sequence alignment there are six groups of TAms.1 The Class III TAms are characterized by broad substrate acceptance and high regio- and stereoselectivities and have been used in a wide range of applications for the synthesis of single isomer chiral amines.1,2 Here, the interrogation of an in silico metagenomic library identified 15 full length non redundant Class III transaminases of which 11 were successfully expressed and used in screens to identify substrate profiles.
The contig library was analysed with the Genemark software to identify and mark ORFs. These putative protein-coding stretches of DNA were then scanned by the Pfam standalone tool. This process marked the ORFs with a Pfam ID linked to a specific protein family or enzyme class. Because of this all ORFs of a desired enzyme class, provided they have a Pfam ID, within the contig library could be extracted.
The Pfam ID for Class III TAms was used to extract all ORFs marked from the contig library. 53 ORFs were identified of which 15 were full length and non-redundant. Enzymes identified from the metagenome were to be cloned using conventional restriction cloning techniques. To this end the sequences for retrieval were scanned for internal restriction sites. Primer pairs where designed for enzymes including an Nde1 restriction site in the forward primer and various restriction enzyme sites in the reverse primer depending on any endogenous restriction sites. 11 out of 15 sequences were successfully retrieved from the metagenome. Blunt end PCR product was ligated into the PCR capture vector PCR-Blunt and through a series of restrictions and ligation was cloned into the pET29a expression vector. The pET29a vector containing the enzyme DNA sequences was transformed into the BL21*DE3 pLysS strain of E. coli for expression. The 11 putative enzymes were named with pQR numbers (see ESI†) and analysed by SDS-PAGE. All of the enzymes had a good level of expression except pQ1108 with a lower level of total expression of soluble and insoluble protein (Fig. 1, B) and pQR1117 (Fig. 1, K) with very little total expression. pQR1112 was completely insoluble (Fig. 1, F1 (total protein), F2 (soluble fraction)) and pQR1114 was partially insoluble (Fig. 1, H1 (total protein), H2 (soluble fraction)).
Three of the 11 TAm enzymes, pQR1108, pQR1113 and pQR1114 were found to accept a wide range of structurally diverse aldehydes and ketones 7–25 from the (S)-MBA screening, and selected results are shown in Fig. 3. pQR1108 showed the highest activity towards both linear and cyclic substrates. For the cyclic substrates, the best conversion yields were obtained with benzaldehyde 3 (61%) and cinnamaldehyde 15 (44%). Modification or absence of the aromatic ring led to significantly lower conversions with for example 4-hydroxybenzaldehyde 13 and cyclohexanecarboxaldehyde 10 (22% and 24% conversion yields, respectively). More hindered aromatic substrates like 1-indanone 14 resulted in lower conversion yields (8%). Cyclohexanone 7 was also well accepted by pQR1108 (27%) and modification of the six-membered ring led to a lower activity for 2-methylcyclohexanone 8 (4%) or no activity for the enone cyclohex-2-en-1-one 9. When using linear substrates, the length of the carbon chain seemed to be important: a 42% conversion yield was obtained with pQR1108 and butanal 25 but 11% for octanal 24. Functionalized aldehydes or ketones were also accepted, for example when using L-erythrulose 20 (9% yield) and glycoaldehyde 21 (9% yield) (Fig. 3). Linear ketones such as 2-heptanone 22 and 2-butanone 23 were not accepted. In order to confirm that the activity was due to the overexpressed TAm an assay for each (positive) substrate was performed in triplicate with E. coli cells containing an empty vector: no activity was observed. The benzaldehyde screening with five classical amine donors ((S)-MBA 1a, (R)-MBA 1b, (S)-alanine 6a, (R)-alanine 6b and isopropylamine 26) did not give rise to any new activities: benzylamine formation was only observed when using (S)-MBA. This highlighted that all three enzymes exhibited (S)-stereoselectivity. For the copper sulfate screening, with sodium pyruvate as amine acceptor and nine different amine donors 4, 27–34, no activity was observed, consistent with the (S)-MBA screen where pyruvate 5 was not accepted as a substrate. The remaining 8 enzymes showed negligible activity against the substrates screened. This reflected for pQR1117 poor enzyme expression, pQR1112 low levels of soluble protein formed, and for pQR1109, pQR1110, pQR1111, pQR1115, pQR1116, pQR1118 most likely that preferred donors/acceptors were not screened. Using a combination of the Bradford assay and SDS page densitometry to calculate protein loading, specific activities for pQR1108, pQR1113 and pQR1114 were investigated for two substrates 3 and 10.
![]() | ||
Fig. 3 Screening results of three of the TAms against 12 selected substrates. Screening conditions: 5 mM aldehyde or ketone, 25 mM (S)-MBA 1a, 1 mM KPi (pH 7.5, 100 mM), enzyme as crude cell lysate 0.2–0.4 mg mL−1, 18 h, 30 °C, 500 rpm. Conversion yields (Scheme 1A assay via detection of 2) were obtained from three independent experiments and varied by <±4%. |
pQR1108 had the highest specific activities of 1.7 μmol min−1 mg−1, and 0.4 μmol min−1 mg−1 for 3 and 10 respectively. pQR1113 had an activity of 0.1 μmol min−1 mg−1 for 10. It was not possible to calculate specific activities for pQR1113 with 3 and pQR1114 with 3 and 10 due to low activity over the course of the reaction.
Sequence homology of the 11 potential TAms was compared18,19 against 4 reported TAms from Vibrio fluvialis (Vf-TAm),20Chromobacterium violaceum DSM30191 (Cv-TAm),21Pseudomonas putida KT2440 (Pp-TAm)22 and Klebsiella pneumonia JS2F (Kp-TAm).23 Interestingly, pQR1113 and pQR1114 were found to have high sequence homology (93% identity) in agreement with the screening results. pQR1108 was found to have reasonably high sequence homology with Kp-TAm (55% identity plus 23% similarity) (Fig. 4). The three enzymes had less than 30% identity to Cv-TAm and Vf-TAm. The remaining 8 enzymes which had shown negligible activity had less than 30% identity with the 4 reported TAms (see Table 1 ESI† for sequence homology comparison to Cv-TAm). pQR1109, pQR1110, pQR1111, pQR1115, pQR1116 and pQR1118 had between 70–95% homology with each other and it is likely that these TAms require a different amine donor. Similarly pQR1113, pQR1114 and pQR1117 form a cluster of TAms that are closely related to each other and are likely to have other donor specificity.
![]() | ||
Fig. 4 Amino acid sequences of the metagenomics TAms and 4 TAms from identified species (Kp-TAm, Pp-TAm, Cv-TAm and Vf-TAm) were aligned and used to generate a maximum likelihood phylogenetic tree (bootstrap values calculated from 1000 replicates).19,24 |
The important active site residues of pQR1108, pQR1113 and pQR1114 were also compared with the same 4 reported TAms. Nine residues are considered as very important in the Cv-TAm dimer and are coloured in red for the key residues of the first monomer and in green for the key residues of the second monomer (Fig. 5).25 Most residues were conserved in the 3 new enzymes. For example, Lys288 (using numbering of amino acids based on the sequence of Cv-TAm) is involved in the formation of a covalent bond with the PLP. Asp259, Ser121 and Tyr153 which are known to be important for the coordination of the PLP in the active site are also all fully conserved in our 3 enzymes. The residue Asp259 is involved in hydrogen bonding with the pyridine ring of the PLP and Ser121 and Tyr153 have roles in the phosphate group coordination of the PLP.
![]() | ||
Fig. 5 Mulitple alignment of the active metagenomics enzymes with Kp-TAm, Pp-TAm, Cv-TAm and Vf-TAm. Residues important for substrate and PLP binding in the Cv-TAm dimer are shown in red for the key residues in the first monomer and in green for the key residues in the second monomer.26 Lysine 288 crucial for Schiff base formation is represented with a red star. |
![]() | ||
Fig. 6 Effect of various parameters for the transamination of cyclohexanecarboxaldehyde 10. Conversion yields (Scheme 1A assay via detection of 2), were obtained from experiments in triplicate and varied by <±4%. |
Using these preferred conditions it was established that the 3 enzymes were not affected by addition of a co-solvent (10% v/v). Interestingly, the conversion with pQR1108 increased with addition of co-solvents DMSO or MeOH (30% and 29% yields compared to 24% without cosolvent) perhaps reflecting increased substrate solubility. The best reaction temperature observed was 30 °C for the 3 enzymes, and a PLP concentration in the range of 1–1.5 mM was required (data not shown). With the following improved conditions, (S)-MBA 1a 25 mM, amine acceptor 10 5 mM, PLP 1 mM, KPi buffer pH 8 and 50 mM, and DMSO as co-solvent (10% v/v) and pQR1108 as cell lysate (0.3 mg mL−1) for 18 hours gave a 41% conversion yield.
Transaminases have been reported for the synthesis of a wide range of amines, however little has been described on the acceptance of conjugated aldehydes/ketones other than the use of cinnamaldehyde 15 as an amine acceptor with Cv-TAm and Vf-TAm.21,30,31 Since pQR1108 demonstrated good activity towards cinnamaldehyde 15 in initial screens it was used with a range of conjugated aromatic aldehydes 35–41, 3-phenylpropanal 42 to investigate the acceptance of a non-conjugated analogue, and several conjugated aliphatic analogues 43–47 to generate allylic amines (Table 1). The previously optimized reaction conditions were used with (S)-MBA 1a as the amine donor. Cinnamaldehyde 15 was most readily accepted in comparable conversion yields to those reported with Cv-TAm.21 Other cinnamaldehyde analogues were accepted other than (E)-3-(4-(dimethylamino)phenyl)acrylaldehyde 38 probably due the dimethylamine electron donating group reducing the electrophilicity of the aldehyde. Modification to the aromatic ring with other electron donating groups including ortho- and para-methoxy and para-methyl moieties, 35, 39 and 37 respectively, gave rise to lower yields (39%–58%) than with cinnamaldehyde again reflecting the lower electrophilicity of the aldehyde. (E)-3-(4-Bromophenyl)acrylaldehyde 36 was also readily accepted in 48% conversion yield. Increasing the steric bulk of the amine acceptor and addition of a methyl group or bromine at the alkene C-2 position led to a slight decrease in conversion yield compared to cinnamaldehyde: (2E)-2-methyl-3-phenylpropenal 40 and (2Z)-2-bromo-3-phenylpropenal 41 were accepted in 57% and 54% yield. The non-conjugated aldehyde 42 gave a conversion yield similar to the substituted cinnamaldehydes (56%). For the linear conjugated aldehydes 43–47, acrolein 43 was accepted although the use of crotonaldehyde 44 gave rise to higher yields. 3-Methylcrotonaldehyde 45, 2-pentenal 46 and 2-methyl-2-pentenal 47 were also accepted but in slightly lower conversion yields than for crotonaldehyde 44. The ease of generating such allylic amines using transaminases is extremely useful and has not been reported to date: it provides a sustainable synthetic approach avoiding the use of metal catalysts or toxic reducing agents.
Aldehyde | Conversiona | |
---|---|---|
Reaction conditions: (S)-MBA 25 mM, amine acceptor 5 mM, PLP 1 mM, KPi buffer 50 mM at pH 8, DMSO as co-solvent (10% v/v) and pQR1108 cell lysate (1.2 mg mL−1), 500 rpm for 18 hours.a Conversions (Scheme 1A assay via detection of 2) were obtained from three independent experiments and varied by <±4%.b Quantified using the amine product and HPLC.c Isolated yield after scale-up. | ||
15 |
![]() |
72% |
75%b | ||
35 |
![]() |
39% |
36 |
![]() |
48% |
37 |
![]() |
49% |
38 |
![]() |
0% |
39 |
![]() |
58% |
57%b | ||
84%c of 48 | ||
40 |
![]() |
57% |
41 |
![]() |
54% |
42 |
![]() |
56% |
58%b | ||
43 |
![]() |
35% |
44 |
![]() |
49% |
45 |
![]() |
45% |
46 |
![]() |
37% |
47 |
![]() |
37% |
To demonstrate the applicability of using pQR1108 further a preparative scale reaction was carried out with the optimised reaction conditions and 2-methoxycinnamaldehyde 39. After 18 hours a quantitative conversion based on acetophenone formation was observed possibly due to an improved reaction when stirring on the larger scale. After purification, amine 48 was isolated in 84% yield.
Contigs that contained ORFs marked with the Pfam identifier for Class III TAms were collected and visualised using the Artemis DNA viewer. ORFS that were truncated by being at the start or end of the contig were discarded. DNA corresponding to full length ORFs was excised. Due to inaccuracies in the in silico sequences stemming from the sequencing technology, ORFs comprising the whole sequence were often fragmented across the three reading frames. Due to this DNA sequence downstream of the suspected stop codon was collected to ensure the full length of the coding sequence was taken forward for analysis. Collected DNA for the ORFs was aligned and visualised using the MEGA software. Multiple sequences were highly similar at the amino acid level, for these matching sequences one example was chosen to take forward for primer design.
PCR primers were designed for the 15 identified enzymes. Due to inconsistencies in the in silico DNA sequences, terminal primers were designed beyond the suspected 3′ end of the enzyme. Primers were designed to contain an NdeI restriction site in the forward primer and an NcoI, XhoI or BglII site in the terminal primer depending on restriction sites within the coding sequence. PCR was carried out on the metagenomics DNA sample using NEB Phusion PCR kit. All reactions were carried out following the standard protocol given in the kit with 52 ng of metagenomic DNA per reaction and a primer concentration of 5 pmol per reaction. Successful PCR was followed by gel electrophoresis with PCR products cut out of the gel and DNA recovered using Qiagen Gel extraction columns. Blunt ended PCR products were ligated into a PCR-Blunt capture vector and transformed into Top10 cloning cells. Top10 cells harbouring the plasmid were grown overnight in 5 mL of Terrific Broth with kanamycin 50 μg mL−1. Plasmids were extracted using Qiagen miniprep kits and sent for sequencing to confirm the 3′ end of the coding sequence of the enzymes with confidence. A second round of terminal primers were designed to remove the stop codon and introduce a restriction site. PCR was carried out using these primers and the PCR capture vector containing the PCR product from the first round of PCR. The same process of cloning as described above was carried out on the 2nd round of PCR products. Plasmid carrying the PCR product from this second round of cloning was purified and digested with Nde1 and the requisite restriction enzymes to generate cohesive ends. The digested plasmid was run on a gel and the cohesive ended insert cut out and extracted from the gel. The digestion fragments were ligated into pET29a, digested to generate matching cohesive ends, and transformed into BL21* DE3 pLysS cells.
Footnotes |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/c6gc02769e |
‡ Joint first authors. |
This journal is © The Royal Society of Chemistry 2017 |