Creating taxonomically-informed metabolome libraries for any species using the pubchem.bio R package

Abstract

Annotation remains a significant challenge in metabolomics, in large part due to the enormous structural diversity of small molecules. PubChem represents one of the largest curated chemical structure databases, with more than 122,000,000 structures, supplemented by extensive biological metadata provided by numerous external sources. While many of these structures are relevant to metabolomics, a majority are unlikely to be measured in a typical metabolomics experiment. This article describes the R package, pubchem.bio, which enables users to: (1) Download metabolomics-centric subset of PubChem onto their local computer, (2) Build a metabolomic structured library of biological compounds in PubChem, (3) Develop custom metabolite structure libraries for any species or collection of species using selected or all available taxonomic data in PubChem and (4) Define a core biological metabolome, comprising metabolites plausibly found in any species. Species-specific metabolomes are enabled through the adoption of a lowest-common-ancestor chemotaxonomy approach, which is implemented by associating PubChem CIDs into the NCBI Taxonomy database hierarchy, enabling extrapolation of taxonomic range beyond the species reported. This package is available via CRAN, and can be used to simplify the annotation process and embed biological metadata into the annotation process.

Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article.

View this article’s peer review history

Article information

Article type
Paper
Submitted
26 Aug 2025
Accepted
15 Dec 2025
First published
15 Dec 2025
This article is Open Access
Creative Commons BY license

Analyst, 2026, Accepted Manuscript

Creating taxonomically-informed metabolome libraries for any species using the pubchem.bio R package

C. Broeckling, Analyst, 2026, Accepted Manuscript , DOI: 10.1039/D5AN00914F

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements