Creating taxonomically-informed metabolome libraries for any species using the pubchem.bio R package
Abstract
Annotation remains a significant challenge in metabolomics, in large part due to the enormous structural diversity of small molecules. PubChem represents one of the largest curated chemical structure databases, with more than 122,000,000 structures, supplemented by extensive biological metadata provided by numerous external sources. While many of these structures are relevant to metabolomics, a majority are unlikely to be measured in a typical metabolomics experiment. This article describes the R package, pubchem.bio, which enables users to: (1) Download metabolomics-centric subset of PubChem onto their local computer, (2) Build a metabolomic structured library of biological compounds in PubChem, (3) Develop custom metabolite structure libraries for any species or collection of species using selected or all available taxonomic data in PubChem and (4) Define a core biological metabolome, comprising metabolites plausibly found in any species. Species-specific metabolomes are enabled through the adoption of a lowest-common-ancestor chemotaxonomy approach, which is implemented by associating PubChem CIDs into the NCBI Taxonomy database hierarchy, enabling extrapolation of taxonomic range beyond the species reported. This package is available via CRAN, and can be used to simplify the annotation process and embed biological metadata into the annotation process.
- This article is part of the themed collection: 150th Anniversary Collection: Metabolomics
Please wait while we load your content...