 Open Access Article
 Open Access Article
      
        
          
            Yan 
            Ma
          
        
       *ab, 
      
        
          
            Yang 
            Cao
          
        
      a, 
      
        
          
            Xiaocui 
            Song
          
        
      a, 
      
        
          
            Weichen 
            Xu
          
        
      c, 
      
        
          
            Zichen 
            Luo
          
        
      c, 
      
        
          
            Jinjun 
            Shan
*ab, 
      
        
          
            Yang 
            Cao
          
        
      a, 
      
        
          
            Xiaocui 
            Song
          
        
      a, 
      
        
          
            Weichen 
            Xu
          
        
      c, 
      
        
          
            Zichen 
            Luo
          
        
      c, 
      
        
          
            Jinjun 
            Shan
          
        
       c and 
      
        
          
            Jingjie 
            Zhou
          
        
      d
c and 
      
        
          
            Jingjie 
            Zhou
          
        
      d
      
aNational Institute of Biological Sciences, Beijing, Beijing 102206, China. E-mail: mayan@nibs.ac.cn
      
bTsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing 100084, China
      
cInstitute of Pediatrics, Jiangsu Key Laboratory of Pediatric Respiratory Disease, Medical Metabolomics Center, Nanjing University of Chinese Medicine, Nanjing 210023, China
      
dThe Affiliated Jiangyin Hospital of Nanjing University of Chinese Medicine, Jiangyin 214400, China
    
First published on 13th September 2023
Recently, amino acids other than glycine and taurine were found to be conjugated with bile acids by the gut microbiome in mouse and human. As potential diagnostic markers for inflammatory bowel disease and farnesoid X receptor agonists, their physiological effects and mechanisms, however, remain to be elucidated. A tool for the rapid and comprehensive annotation of such new metabolites is required. Thus, we developed a semi-empirical MS/MS library for bile acids conjugated with 18 common amino acids, including alanine, arginine, asparagine, aspartate, glutamine, glutamate, histidine, isoleucine, leucine, lysine, methionine, phenylalanine, proline, serine, threonine, tryptophan, tyrosine, and valine. To investigate their fragmentation rules, these amino acids were chemically conjugated with lithocholic acid, deoxycholic acid, and cholic acid, and their accurate-mass MS/MS spectra were acquired. The common fragmentation patterns from the amino acid moieties were combined with 10 general bile acid skeletons to generate a semi-empirical MS/MS library of 180 structures. Software named BAFinder 2.0 was developed to combine the semi-empirical library in negative mode and the characteristic fragments in positive mode for automatic unknown identification. As a proof of concept, this workflow was applied to the LC-MS/MS analysis of the feces of human, beagle dogs, and rats. In total, 171 common amino acid-conjugated bile acids were annotated and 105 of them were confirmed with the retention times of synthesized compounds. To explore other potential bile acid conjugates, user-defined small molecules were in-silico conjugated with bile acids and searched in the fecal dataset. Four novel bile acid conjugates were discovered, including D-Ala-D-Ala, Lys(iso)-Gly, L-2-aminobutyric acid, and ornithine.
Bile acids have long been known to be conjugated with glycine and taurine by liver enzymes.6 Recently, other amino acids, such as leucine, phenylalanine, and tyrosine, were reported for the first time to be conjugated to bile acids by the gut microbiome in human and mouse.7 These novel metabolites agonized the farnesoid X receptor in vitro and were enriched in patients with inflammatory bowel disease.7 Since then, more bile acid conjugations have been discovered, covering almost all common amino acids.8–10Fig. 1 shows the simplified structure of an amino acid-conjugated bile acid. The structural diversities from both the steroid core and the amino acid moiety pose a challenge for unknown identification.
Liquid chromatography-mass spectrometry (LC-MS) has been widely applied in the analysis of amino acid-conjugated bile acids. For example, Zhu et al.11 developed a method that combined chemical derivatization and alternating dual-collision energy scanning mass spectrometry, and identified 17 bile acids conjugated with alanine, proline, leucine, and phenylalanine from mouse intestine contents and feces. Wang et al.12 established a polarity-switching multiple reaction monitoring (MRM) mass spectrometry method and screened 118 amino acid-conjugated bile acids, which was the most comprehensive profiling of such metabolites to our knowledge so far. However, most of these reported methods are targeted at only certain amino acid conjugations. Besides, additional expertise and time are often required for manual spectral interpretation.
To facilitate the automatic annotation of amino acid-conjugated bile acids in biological samples, a MS/MS library is needed. However, due to the lack of reference standard compounds, a comprehensive experimental library is not feasible. In silico MS/MS libraries generated from fragmentation rules and computational methods have been applied to the annotation of various compound classes, such as lipids13 and acyl-CoA.14 Therefore, we set out to expand the coverage of an experimental MS/MS library with the spectra predicted from the fragmentation rules of 18 common amino acid-conjugated bile acids in negative ESI mode. This semi-empirical library can be used with any generic spectra search engine (e.g., NIST MS Search), or with BAFinder, a software dedicated for the identification of bile acids.15 The updated BAFinder 2.0 software combined the characteristic fragments in the positive mode with the semi-empirical library in the negative mode to further increase the confidence of annotation. What's more, novel bile acid conjugates other than common amino acids could be annotated using a similar strategy.
![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) :
:![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) t-BuOH (3
t-BuOH (3![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) :
:![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 1, v/v). Next, a solution of 125 mM amino acid and 100 mM NaOH was prepared in 10 μL water. For cysteine and tyrosine, which are less soluble in water, the volume of solvent was increased to 100 μL. The two solutions were mixed and shaken at 80 °C for 2 h. When the reaction was done, the pH of the resulting solution was adjusted to 7 with 100 mM HCl. It was then diluted 100 times with 50% MeOH and 1 μL was injected in to the LC-MS system for analysis.
1, v/v). Next, a solution of 125 mM amino acid and 100 mM NaOH was prepared in 10 μL water. For cysteine and tyrosine, which are less soluble in water, the volume of solvent was increased to 100 μL. The two solutions were mixed and shaken at 80 °C for 2 h. When the reaction was done, the pH of the resulting solution was adjusted to 7 with 100 mM HCl. It was then diluted 100 times with 50% MeOH and 1 μL was injected in to the LC-MS system for analysis.
      
      
        ![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) :
:![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 60 mobile phase A (7.5 mM ammonium acetate in water, adjusted to pH 4.35 using acetic acid): mobile phase B (acetonitrile). The flow rate was 0.3 mL min−1. MS spectra were acquired using the following ESI source settings: spray voltage 3.5 kV (positive mode) or 2.5 kV (negative mode), aux gas heater temperature 380 °C, capillary temperature 320 °C, sheath gas flow rate 30 units, aux gas flow gas 10 units. MS1 scan parameters included a resolution of 60
60 mobile phase A (7.5 mM ammonium acetate in water, adjusted to pH 4.35 using acetic acid): mobile phase B (acetonitrile). The flow rate was 0.3 mL min−1. MS spectra were acquired using the following ESI source settings: spray voltage 3.5 kV (positive mode) or 2.5 kV (negative mode), aux gas heater temperature 380 °C, capillary temperature 320 °C, sheath gas flow rate 30 units, aux gas flow gas 10 units. MS1 scan parameters included a resolution of 60![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 000, AGC target of 3e6, and maximum injection time of 200 ms. MS/MS spectra were acquired in negative ESI mode with a normalized collision energy of 60 and positive ESI mode with a normalized collision energy of 45. Data-dependent MS2 (dd-MS2) acquisition employed a resolution of 30
000, AGC target of 3e6, and maximum injection time of 200 ms. MS/MS spectra were acquired in negative ESI mode with a normalized collision energy of 60 and positive ESI mode with a normalized collision energy of 45. Data-dependent MS2 (dd-MS2) acquisition employed a resolution of 30![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 000, AGC target of 2e5, and maximum injection time of 100 ms.
000, AGC target of 2e5, and maximum injection time of 100 ms.
        Raw data were converted to mgf format using ProteoWizard MSConvert17 software. MS/MS fragmentation patterns were then manually investigated with NIST MS Search software. For each amino acid conjugation, the common characteristic product ions from conjugated LCA, DCA, and CA were selected. The formulas of these fragments were annotated according to their m/z and chemical structures. Their theoretical m/z were then calculated using Molecular Weight Calculator (Tables S1 and S2†). For the negative mode data, the average relative intensities were also calculated and rounded to the nearest 5% for the semi-empirical library development (Table S1†).
BAFinder is a software tool for bile acid annotation from LC-MS/MS data.15 The previously released version 1.0 covered free bile acids and common conjugations, such as glycine, taurine, sulfate, and glucuronide. The software was updated to version 2.0 with an expanded scope for amino acid-conjugated bile acids other than glycine and taurine. The overall workflow is shown in Fig. 3. First, the unknown MS/MS spectra in negative mode were searched against the semi-empirical library, and a normalized dot product was calculated using eqn (1):
|  | (1) | 
For other amino acid conjugations not included in the semi-empirical library, a new feature was added to BAFinder 2.0 software to allow the annotation of user-defined conjugations. The annotation of novel conjugations was based on the observation that most amino acid-conjugated bile acids generated fragments of amino acid [M − H]− and [M + H]+ in negative and positive mode, respectively. Therefore, matching of the calculated precursor m/z, presence of amino acid [M − H]− product ion in negative mode, as well as amino acid [M + H]+ and bile acids fragments in positive mode were used as the criteria for screening.
![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 000g at 4 °C for 15 min. The supernatant was transferred to a new tube, dried by SpeedVac, and stored at −20 °C. On the day of the experiment, the fecal extracts were resuspended with 100 μL 50% methanol, centrifuged, and filtered through a 0.45 μm filter.
000g at 4 °C for 15 min. The supernatant was transferred to a new tube, dried by SpeedVac, and stored at −20 °C. On the day of the experiment, the fecal extracts were resuspended with 100 μL 50% methanol, centrifuged, and filtered through a 0.45 μm filter.
        LC-MS/MS analysis of the fecal samples was performed with the same instrumental setup and mobile phases as above. An Acquity BEH C18 column (2.1 mm × 100 mm, 1.7 μm) (Waters, Milford, MA, USA) was used for the separation.19,20 The column temperature was 45 °C and flow rate was 0.45 mL min−1. The gradient program was 0.0–0.5 min (5% B), 0.5–1.0 min (5%–20% B), 1.0–2.0 min (20%–25% B), 2.0–5.5 min (25% B), 5.5–6.0 min (25%–30% B), 6.0–8.0 min (30% B), 8.0–9.0 min (30%–35% B), 9.0–17.0 min (35%–65% B), 17.0–18.0 min (65%–100% B), 18.0–19.0 min (100% B), 19.0–19.5 (100%–5% B) and 19.5–24.0 min (5% B). The injection volume was 5 μL.
Raw data were converted to mzML format using ProteoWizard MSConvert17 and then processed by XCMS.21 MS/MS spectra were converted to mgf files using RawConverter.22 The resulting peak table and alignment files, together with the mgf files, were imported into BAFinder 2.0 software. The m/z and RT tolerance was set to 0.005 Da and 0.05 min, respectively. The semi-empirical library of amino acid-conjugated bile acids in negative mode was added to the existing experimental library of bile acids. Spectral match hits with dot-product scores higher than 500 were reported. If the corresponding spectra in positive mode matched the characteristic fragments of amino acid and bile acid, a higher confidence level would be assigned to the annotation result.
To explore novel conjugates of bile acids, small molecules containing amino and carboxyl groups were extracted from the HMDB database23 using InstantJchem, ChemAxon (https://www.chemaxon.com). The following filters were applied to the unique formulas: (1) containing only C, H, O, N, P, S elements; (2) molecular weight less than 500 Da. Glycine, taurine, and other common amino acids in the semi-empirical library were excluded. Accurate masses of these metabolites were imported into BAFinder 2.0, and bile acids conjugated with these molecules were screened in the fecal dataset.
To validate the annotation result and distinguish isomers, some candidate structures were synthesized, and their retention times were compared with the features in biological samples. Specifically, to determine the isomer position of bile acids conjugated with C8H17N3O3, Boc-Lys-OH was reacted with glycocholic acid (GCA) and glycochenodeoxycholic acid (GCDCA) using the same method mentioned previously. To the dried conjugation products, 30 μL dichloromethane and 10 μL TFA pre-cooled at 0 °C were added to remove the Boc protecting groups. After the reaction was done (∼45 min), the mixture was evaporated and 200 μL saturated aqueous NaHCO3 solution was added. Synthesized Lys(iso)-GCA and Lys(iso)-GCDCA were loaded on Sep-Pak C18 cartridges (1 cc, 100 mg) and eluted with methanol for clean-up.
| Conjugate | Fragment ions | ||||||
|---|---|---|---|---|---|---|---|
| AA | AA-NH3 | AA-CO2 | AA-(NH3 + CO2) | AA-H2O | AA-C3H5NO2 | Others | |
| Ala | 88 | ||||||
| Leu/Ile | 130 | ||||||
| Lys | 145 | ||||||
| Met | 148 | ||||||
| Pro | 114 | ||||||
| Val | 116 | ||||||
| Arg | 131(AA-CH2N2) | ||||||
| Thr | 74(AA-C2H4O) | ||||||
| Ser | 104 | 74(AA-CH2O) | |||||
| Phe | 164 | 147 | 72(AA-C7H8) | ||||
| Asp | 115 | 88 | 71 | ||||
| Glu | 146 | 102 | 128 | 84(AA-CH2O3) | |||
| His | 154 | 137 | 110 | 93 | |||
| Asn | 131 | 114 | 70 | 113 | 96(AA-H2O + NH3) | ||
| Gln | 145 | 128 | 101 | 84 | 127 | 109(AA-H4O2) | |
| Trp | 203 | 159 | 142 | 116 | 74(AA-C9H7N), 72(AA-C9H9N) | ||
| Tyr | 180 | 163 | 119 | 93 | 107(AA-C2H3NO2), 74(AA-C7H6O), 72(AA-C7H8O) | ||
To expand the structural space of the experimental MS/MS library, a semi-empirical library was developed. Bile acids with various numbers of hydroxyl groups (1OH-BA, 2OH-BA, 3OH-BA, 4OH-BA) and oxidized forms (1O-BA, 1O1OH-BA, 2O-BA, 1O2OH-BA, 2O1OH-BA, 3O-BA) were combined with 18 amino acids to generate 180 conjugated structures. Here, positional and stereoisomers of the bile acids were not considered since they cannot be distinguished with current MS/MS information. The theoretical m/z of the [M − H]− precursors were combined with their corresponding characteristic product ions to generate a series of semi-empirical MS/MS spectra (Fig. S3†). The resulting library was exported in MSP format, which could be used as itself (e.g., with BAFinder15 or MS-DIAL25), or converted to other library formats, such as NIST MS library through lib2nist software.
To evaluate the performance of the semi-empirical library generated in this way, cross-validation was performed using the synthesized compounds. Specifically, a semi-empirical library was created from the experimental spectra of amino acid-conjugated DCA and CA, and the experimental spectra of amino acid-conjugated LCA were searched against this library. The same process was repeated to annotate the amino acid-conjugated DCA and CA. As shown in Table S3,† all the synthesized compounds were correctly identified using the semi-empirical library built with the other two bile acid classes, with an average dot-product score of 834.
To improve the confidence of the compound identification, an automatic workflow was developed using data from both the negative and positive mode (Fig. 3). Unknown features were first filtered and grouped according to the calculated precursor m/z of the amino acid-conjugated bile acids. After that, the MS/MS spectra in the negative mode were searched against the semi-empirical library, and hits with dot-product scores higher than 500 were reported. In addition, characteristic product ions of amino acids were searched in the MS/MS spectra in the positive mode (if available). If these amino acid-related fragments were found, the other fragments in the MS/MS spectra, which were supposed to come from bile acids, were searched against the experimental MS/MS library of free bile acids.15 A higher confidence level would be assigned to the annotation result if hits were found in both positive and negative mode. This workflow was integrated into a software tool named BAFinder 2.0, which was the updated version of BAFinder 1.0 software for common bile acid annotation.15
Fig. 4A shows the distribution of amino acid conjugates in the feces of the different species. In terms of the number of bile acids annotated, the top 5 most frequently observed conjugates included β-alanine/alanine, leucine/isoleucine, lysine, phenylalanine, and glutamic acid. Actually, more β-alanine-conjugated bile acids were identified compared to alanine. The proline conjugate was not detected in this study, probably due to the wider peak shape and the resulting lower peak height compared to other conjugates with the current LC method. The bile acids conjugated with amino acids also varied with different species (Fig. 4B). Bile acids with two hydroxyl groups (2OH-BA, e.g., DCA) contributed the most in the human fecal samples, while 3OH-BA (e.g., CA and β-MCA) was the major component in the rat feces. These patterns were consistent with the bile acids profiles of the two species.29 Compared to the other two, the dog feces contained more amino acid-conjugated oxidized bile acids, including 1O1OH-BA and 1O2OH-BA.
|  | ||
| Fig. 4 Distribution of (A) amino acids and (B) bile acids in amino acid-conjugated bile acids in human, dog, and rat feces. Statistics were based on the numbers of bile acids annotated. | ||
So far, the current approach was limited to only 18 common amino acid-conjugated bile acids. To discover potential new bile acid conjugates, we assumed that other broadly defined amino acids (i.e., small molecules with amino and carboxyl groups) might also conjugate with bile acids and their MS/MS fragmentation patterns would be similar to the common amino acid conjugates. Based on the above assumptions, a new feature for unknown conjugated bile acid annotation was added to BAFinder 2.0 software. When a list of customized amino acid masses was imported, BAFinder 2.0 in silico conjugated them with 10 general bile acid skeletons and calculated their theoretical precursor m/z in negative and positive mode. The unknown MS/MS spectra passing the precursor filter were screened with the characteristic product ions of [M − H]− and [M + H]+ of the amino acids in the negative and positive mode, respectively. In addition, the positive mode spectra were searched against the library of free bile acids, in a similar way as for the common amino acid conjugates. To showcase the usefulness of this feature, 7326 structures with both amino and carboxyl groups were extracted from the HMDB database,23 and 855 unique formulas were obtained with only common organic elements (C, H, O, N, P, S) and a molecular weight smaller than 500 Da. Their exact masses were imported into BAFinder 2.0, and the corresponding conjugated bile acids were searched in the fecal dataset. Two new conjugates, C6H12N2O3 and C8H17N3O3, were found in the human/rat fecal samples to be conjugated with 1OH, 2OH, and 3OH-BA. Examination of their MS/MS spectra (Fig. 5A and B) revealed fragment ions of alanine (m/z 88) and lysine (m/z 145), respectively. Therefore, they were tentatively annotated as Ala-Ala and Gly-Lys (or Lys-Gly). After comparison with the retention times of the synthesized compounds, they were finally identified as D-Ala-D-Ala-LCA, D-Ala-D-Ala-DCA, D-Ala-D-Ala-HDCA, Lys(iso)-Gly-CDCA, and Lys(iso)-Gly-CA (Table 2). Here, “iso” indicates that lysine was connected to glyco-BA via an isopeptide bond, which eluted later than its isomer with a common peptide bond. These dipeptide-conjugated bile acids were reported for the first time in biological samples. Two other novel conjugates, L-2-aminobutyric acid and ornithine, were annotated in similar ways (Fig. 5C and D). Similar to lysine, ornithine contained two amino groups and thus could generate two conjugate isomers. The one with earlier retention times was detected in dog and rat feces. The structure was tentatively identified as the peptide bond form based on the experience of lysine conjugates.
| Conjugate | Bile acid | [M − H]− | RT (min) | Human | Dog | Rat | 
|---|---|---|---|---|---|---|
| D-Ala-D-Ala | LCA | 517.365 | 13.39 | ✓ | ||
| D-Ala-D-Ala | DCA | 533.360 | 11.60 | ✓ | ||
| D-Ala-D-Ala | HDCA | 533.361 | 9.16 | ✓ | ||
| Lys(iso)-Gly | CDCA | 576.402 | 10.56 | ✓ | ||
| Lys(iso)-Gly | CA | 592.397 | 8.13 | ✓ | ||
| L-2-Aminobutyric acid | 12o-DCA | 474.323 | 11.00 | ✓ | ||
| L-2-Aminobutyric acid | DCA | 476.338 | 12.30 | ✓ | ✓ | |
| L-2-Aminobutyric acid | ω-MCA | 492.333 | 7.02 | ✓ | ||
| L-2-Aminobutyric acid | CA | 492.333 | 10.10 | ✓ | ||
| Ornithine | DCA | 505.366 | 10.20 | ✓ | ||
| Ornithine | CA | 521.360 | 7.20 | ✓ | ✓ | |
| Ornithine | α-MCA | 521.360 | 3.74 | ✓ | ||
| Ornithine | β-MCA | 521.361 | 3.66 | ✓ | ||
| Ornithine | ω-MCA | 521.362 | 3.52 | ✓ | 
The sources and functions of these novel bile acid conjugates remain to be discovered. A microbial origin was very likely by analogy with the common amino acid conjugates. A previous study found 27 species of gut bacteria, most in the Bifidobacteriaceae, Lachnospiraceae, and Bacteroidaceae families, were capable of conjugating one or more of 16 common amino acids to CDCA, DCA, or CA.30 Similar screening experiments could find the species responsible for the production of the new metabolites discovered in this study. Conjugations of amino acids such as D-Ala, Lys, and Gly increased the hydrophilicities of bile acids, while dipeptide conjugates (D-Ala-D-Ala and Lys-Gly) were even more hydrophilic, according to their retention times. This may then affect their emulsifying, signaling, and antimicrobial properties.3 Some amino acid-conjugated bile acids, namely Phe-CA, Tyr-CA, and Leu-CA, were found at higher levels in patients with inflammatory bowel disease and cystic fibrosis.7 Further investigations are required to reveal the associations between these conjugated bile acids and health or the disease state.
| Footnote | 
| † Electronic supplementary information (ESI) available: Characteristic product ions of amino acid-conjugated bile acids in negative and positive ESI mode (Tables S1 and 2), identification of synthesized compounds using the semi-empirical library (Table S3), amino acid-conjugated bile acids detected in human, dog and rat fecal samples (Table S4), comparison of MS intensities of amino acid-conjugated LCA, DCA, and CA in the negative and positive mode (Fig. S1), MS/MS spectra of cysteine-conjugated LCA, DCA, and CA in negative mode (Fig. S2), workflow of library development (Fig. S3), MS/MS spectra of amino acid-conjugated CA in the positive mode (Fig. S4). See DOI: https://doi.org/10.1039/d3an01237a | 
| This journal is © The Royal Society of Chemistry 2023 |