AminoacidDB: a liquid chromatography-tandem mass spectrometry-based toolkit for the untargeted analysis of non-protein amino acids
Abstract
Non-protein amino acids (npAAs) are produced by microbes, plants and humans, with previous estimates suggesting that there are ≈1000 of such metabolites. Most of the npAAs were discovered as human toxins, intermediates in metabolism and byproducts of organic and pharmaceutical synthesis. We used a text-mining approach to identify chemicals with the NHx-R-COOH moiety in PubChem and cross-checked those for classification against amino acid databases including Web of Science, LOTUS and HMDB to generate a dataset of compounds, which was cleaned and curated, resulting in a library of 332,154 amino acids. We established a standard set of 41 npAAs, selected to cover a wide array of structural and isomeric space for training the machine learning model and predicting chromatography elution using the Retip tool. Derivatization added a 6-aminoquinoline (6-AMQ) tag to the N[H] group, thus selecting amine-carrying compounds from the sample extract, which can be identified by cleaving the 6-AMQ carbonyl and producing the common product ion of 171.0555 m/z in positive ionization mode to selectively target amino acids in unknown datasets. AminoacidDB (https://www.aminoacidDB.ca) annotates amino acids by matching the features of accurate mass and retention time from untargeted mass spectrometry datasets against the aminoacidDB library. In a proof-of-concept experiment, we putatively annotated 103 amino acids and their derivatives in Arabidopsis thaliana and Cannabis sativa leaf tissues. Our original data hypothesize a wider distribution of npAAs and peptides in plants than was previously known and indicate the need for more research to understand the prevalence and metabolism of npAAs.
- This article is part of the themed collection: 150th Anniversary Collection: Metabolomics

Please wait while we load your content...