Open Access Article
Mirko
Treccani
*a,
Lucia
Ghiretti
a,
Nicole
Tosi
a,
Cristiana
Mignogna
a,
Anne-Marie
Minihane
bc,
Valeria
Barili
d,
Daniele
Del Rio
a,
Davide
Martorana
ef and
Pedro
Mena
a
aHuman Nutrition Unit, Department of Food and Drug, University of Parma, 43125, Parma, Italy. E-mail: mirko.treccani@unipr.it
bNutrition and Preventive Medicine, Norwich Medical School, University of East Anglia, Norwich, UK
cNorwich Institute of Healthy Ageing, University of East Anglia, Norwich, UK
dMedical Genetics, Department of Medicine and Surgery, University of Parma, 43126, Parma, Italy
eMedical Genetics Unit, Department of Onco-Hematology, University Hospital of Parma, 43126, Parma, Italy
fCoreLab Unit, Research Center, University Hospital of Parma, 43126 Parma, Italy
First published on 19th November 2025
(Poly)phenols are bioactive compounds found in plant-based food. They have been associated with numerous health-promoting features. To date, several research studies are investigating their contribution to human health and disease by applying different omics techniques. However, a standard reference for human genomics investigation is missing, limiting the current understanding and knowledge on the impact of (poly)phenols on humans when using omics approaches. In this study, we present a computational functional investigation of 121 candidate human genes involved in (poly)phenol absorption, distribution, metabolism, and excretion to propose a standard reference for genomics analyses in personalised nutrition and health research. Starting from genomics information, this reference framework includes gene networks, exploring their functional consequences and favouring the understanding of protein–protein interactions, thus paving the way for future multi-omics approaches.
Several human genes and their enzymatic and non-enzymatic protein products participate in the absorption, distribution, metabolism, and excretion (ADME) of (poly)phenols, influencing their bioavailability and bioactivity. After ingestion, dietary (poly)phenols reach the small intestine, where only a small fraction is absorbed. The bulk of (poly)phenols are catabolised by the gut microbiota to be widely absorbed at the colonic level. Once absorbed, phenolic compounds mostly undergo phase II enzymatic metabolism, to increase their hydrophilicity and excretion through the urine; in these processes, they can be conjugated with one or moieties of glucuronic acid, sulphate and methyl groups.5,6 Absorption comprises the action of several actors, such as the lactase phlorizin hydrolase (LPH) encoded by the lactase (LCT) gene, the cytosolic β-glucosidase (CBG) encoded by the glucosylceramidase beta 3 (GBA3), and sodium-dependent glucose transporter 1 (SGLT1) encoded by the solute carrier family 5 member 1 (SLC5A1) genes.7,8 Conjugation with other molecules is catalysed by uridine-5′-diphosphate glucuronosyltransferases (UGTs), sulfotransferases (SULTs), and catechol-O-methyltransferases (COMT).4 The action of other catalysers, such as the cytochrome P450 (CYP) gene family,8 and transporters, like the adenosine triphosphate (ATP)-binding cassette (ABC)9 and the solute carriers (SLC) transporter families,7,8 may also determine (poly)phenol bioavailability and bioactivity.
An increasing number of studies are highlighting that ability and effect size of (poly)phenols on the risk of chronic disease depends on inter-individual variability in (poly)phenol bioavailability.10,11 In recent years, various studies have employed omics approaches to elucidate the roles of genetics, epigenetics, microbiome, and environmental factors in contributing to inter-individual variability of different (poly)phenols,12–16 aiming at proposing novel strategies for personalised health approaches.17–19 However, despite these efforts, a reference for human genomics research in (poly)phenols is missing. To address this gap, we are proposing a computational investigation of human genes related to (poly)phenol ADME, using gene networks and computational functional analyses. In this study, we aim to provide an overview of the current resources available from public repositories, answering the question on what to expect when performing human genomics analysis in (poly)phenols research studies, hence proposing a reference for future omics investigations. These findings will help support multi-genomics and multi-omics investigations, advancing (poly)phenol-related personalised health approaches and potentially dietary guidance.
Network analysis was performed using the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database version 12.0
21 (accessed on 15th January 2025), considering the following parameters: network type set to “full STRING network”; all the interaction sources were enabled (“Textmining”, “Experiments”, “Databases”, “Co-expression”, “Neighborhood”, “Gene Fusion”, and “Co-occurrence”); minimum interaction score set to “medium confidence (0.400)”. Clustering was performed using the STRING built-in k-means clustering algorithm. Functional enrichment analysis was performed using the following references: Gene Ontology (GO),22,23 Kyoto Encyclopedia of Genes and Genomes (KEGG),24–26 REACTOME,27 WikiPathways,28,29 MONARCH,30 DISEASES,31 TISSUES,32 and COMPARTMENTS33 databases. Significant enrichment was assessed by the false discovery rate (FDR) method.
Computational functional analysis was performed using the Functional Mapping and Annotation of Genome-Wide Association Studies (FUMAGWAS)34 platform version 1.5.2 and its module GENE2FUNC, considering the following sources: Ensembl35 version 110, the Molecular Signature Database (MSigDB)36 version 7.0, WikiPathways28,29 version 20191010, the GWAS Catalog37 version e0_r2022-11-29, and the Genotype-Tissue Expression (GTEx)38 version 8. Differentially expressed gene sets and gene set enrichment analyses were performed using a hypergeometrical test adjusted by Bonferroni correction.
868 human protein-coding and the 42
160 human non-coding human genes, 121 genes were selected since their reported involvement in (poly)phenol metabolism. These genes were located along 21 chromosomes, comprising all the autosomes but chromosomes 18 and 21, and the sex chromosome X. The most enriched chromosomes were three: 4 (18 genes), 2 (15 genes), and 11 (14 genes). Genes were almost equally distributed between the plus (71 genes) and minus (50 genes) strands of the human genome. All the genes were reported in Ensembl, except for mucin 3B (MUC3B). Finally, genes were categorised depending on their role in (poly)phenol ADME, with “Phase 2 metabolism” (31 genes), “Colonic metabolism and absorption” (25 genes), “Absorption, Distribution, Metabolism, and Elimination” (17 genes), and “Phase 1 metabolism” (16 genes) being the more represented. Of note, 16 genes fell into the “Additional” category. An overview is provided in Fig. 1.
Across all the tested databases, functional enrichment was observed at several levels, the complete results being reported in Table S1. Gene Ontology terms were enriched in biological processes related to different types of metabolisms, such as xenobiotics (GO:0006805, FDR = 2.12 × 10−48) and organic acids (GO:0006802, FDR = 3.62 × 10−31), and metabolic reactions, such as cellular glucuronidation (GO:0052695, FDR = 8.29 × 10−28); in molecular functions such as glucuronosyltransferase (GO:0015020, FDR = 2.93 × 10−24), UDP-glycosyltransferase (GO:0008194, FDR = 1.08 × 10−17), and monooxygenase (GO:0004497, FDR = 6.27 × 10−16) activities; in cellular components such as the endomembrane system (GO:0012505, FDR = 8.37 × 10−14) and the Golgi lumen (GO:0005796, FDR = 1.21 × 10−15) and different membrane compartments, such as the membrane (GO:0016020, FDR = 8.37 × 10−14), and intrinsic (GO:0031224, FDR = 8.37 × 10−14) and integral (GO:0016021, FDR = 1.43 × 10−13) components. In summary, significant terms comprised the metabolism of different phenolic compounds, such as xenobiotics, and associated molecular functions, mostly related but not limited to glucuronidation. An overview of the most significant results from GO is provided in Table 1.
| GO-term | Description | Genes in network | Strength | Signal | FDR |
|---|---|---|---|---|---|
| GO:0006805 | Xenobiotic metabolic process | 38/115 | 1.75 | 9.17 | 2.12 × 10−48 |
| GO:0071466 | Cellular response to xenobiotic stimulus | 39/181 | 1.56 | 7.33 | 6.74 × 10−44 |
| GO:0009410 | Response to xenobiotic stimulus | 44/422 | 1.25 | 4.77 | 6.74 × 10−38 |
| GO:0006082 | Organic acid metabolic process | 49/868 | 0.98 | 3.04 | 3.62 × 10−31 |
| GO:0052695 | Cellular glucuronidation | 18/19 | 2.21 | 7.14 | 8.29 × 10−28 |
| GO:0015020 | Glucuronosyltransferase activity | 18/34 | 1.95 | 5.9 | 2.93 × 10−24 |
| GO:0008194 | UDP-glycosyltransferase activity | 20/143 | 1.38 | 3.45 | 1.08 × 10−17 |
| GO:0004497 | Monooxygenase activity | 17/103 | 1.45 | 3.33 | 6.27 × 10−16 |
| GO:0070330 | Aromatase activity | 12/24 | 1.93 | 3.91 | 1.24 × 10−15 |
| GO:0016758 | Hexosyltransferase activity | 19/194 | 1.22 | 2.66 | 2.01 × 10−14 |
| GO:0005796 | Golgi lumen | 17/106 | 1.44 | 3.26 | 1.21 × 10−15 |
| GO:0012505 | Endomembrane system | 70/4721 | 0.4 | 0.92 | 8.37 × 10−14 |
| GO:0031224 | Intrinsic component of membrane | 78/5828 | 0.36 | 0.85 | 8.37 × 10−14 |
| GO:0016020 | Membrane | 99/9523 | 0.25 | 0.70 | 8.37 × 10−14 |
| GO:0016021 | Integral component of membrane | 76/5670 | 0.36 | 0.85 | 1.43 × 10−13 |
Looking at pathways, enrichment was observed at several levels. The metabolism of different compounds was significant, such as for xenobiotics (hsa00980, FDR = 6.63 × 10−37; HSA-211981, FDR = 2.54 × 10−18), retinol (hsa00830, FDR = 7.54 × 10−36), and others. Moreover, several metabolic processes were enriched, such as pentose and glucuronate interconversion (hsa00040, FDR = 2.54 × 10−28), phase I and phase II metabolism (HSA-211945, FDR = 6.85 × 10−17; HSA-156580, FDR = 1.17 × 10−33; WP702, FDR = 2.14 × 10−53), glucuronidation (HSA-156588, FDR = 9.90 × 10−26; WP698, FDR = 2.30 × 10−27), and sulfation (WP692, FDR = 7.90 × 10−13). Enrichment was also observed in diseases-related pathways, spanning from general “chemical carcinogenesis” (hsa05204, FDR = 4.13 × 10−48) to familial hyperphosphatemic tumoral calcinosis (HFTC; HSA-5083625, FDR = 4.27 × 10−25) and colorectal cancer (FDR = 4.27 × 10−25). Moreover, drug-related pathways showed significance at several levels, such as in drug metabolism (hsa00982, FDR = 7.30 × 10−43; hsa00983, FDR = 2.81 × 10−26) and ADME (HSA-98748784, FDR = 3.54 × 10−47), and for specific drugs like aspirin (HSA-9749641, FDR = 1.01 × 10−36), paracetamol (HSA-9753281, FDR = 3.31 × 10−23), and codeine and morphine (WP1604, FDR = 9.08 × 10−19). Overall, enriched terms were observed in different metabolic processes, such as phase I and phase II metabolism, for various phenolic metabolites, such as xenobiotics and retinol, and drugs, mostly of phenolic origin like aspirin and paracetamol; significant terms were also observed in cancer-related diseases. An overview of the most enriched terms is reported in Table 2.
| Pathway | Description | Genes in network | Strength | Signal | FDR |
|---|---|---|---|---|---|
| hsa05204 | Chemical carcinogenesis | 34/76 | 1.88 | 10.01 | 4.13 × 10−48 |
| hsa00982 | Drug metabolism – cytochrome P450 | 30/64 | 1.90 | 9.28 | 7.30 × 10−43 |
| hsa00140 | Steroid hormone biosynthesis | 28/60 | 1.90 | 8.78 | 5.70 × 10−40 |
| hsa00980 | Metabolism of xenobiotics by cytochrome P450 | 27/69 | 1.82 | 7.92 | 6.63 × 10−37 |
| hsa04976 | Bile secretion | 28/88 | 1.73 | 7.41 | 2.49 × 10−36 |
| HSA-211859 | Biological oxidations | 45/215 | 1.55 | 7.94 | 1.33 × 10−51 |
| HSA-9748784 | Drug ADME | 36/105 | 1.77 | 9.12 | 3.54 × 10−47 |
| HSA-9749641 | Aspirin ADME | 25/42 | 2.00 | 8.61 | 1.01 × 10−36 |
| HSA-156580 | Phase II – conjugation of compounds | 28/104 | 1.66 | 6.67 | 1.17 × 10−33 |
| HSA-1430728 | Metabolism | 63/2092 | 0.71 | 1.89 | 4.66 × 10−28 |
| WP702 | Metapathway biotransformation Phase I and II | 44/180 | 1.62 | 8.68 | 2.14 × 10−53 |
| WP691 | Tamoxifen metabolism | 19/21 | 2.19 | 7.66 | 3.76 × 10−30 |
| WP698 | Glucuronidation | 18/25 | 2.09 | 6.84 | 2.30 × 10−27 |
| WP2882 | Nuclear receptors meta-pathway | 29/312 | 1.2 | 3.54 | 1.92 × 10−23 |
| WP5276 | Estrogen metabolism | 14/24 | 2.0 | 5.06 | 3.32 × 10−20 |
Enrichment in genes associated with diseases and human phenotypes was also investigated. Significant signals were observed in metabolic disorders (DOID:0014667, FDR = 4.47 × 10−8: DOID:655, FDR = 1.64 × 10−7), mostly related to bilirubin, bile and gallbladder (DOID:2741, FDR = 1.57 × 10−13; DOID:2739, FDR = 2.79 × 10−9; DOID:10211, FDR = 6.70 × 10−8). Moreover, metabolite (EFO:0005664, FDR = 7.53 × 10−14; EFO:0005653, FDR = 7.53 × 10−14) and vitamin measurements (EFO:0004729, FDR = 4.33 × 10−7; EFO:0004631, FDR = 4.21 × 10−6) and related abnormalities (HP:0001939, FDR = 1.64 × 10−5; HP:0100508, FDR = 6.56 × 10−5) turned out to be enriched. In summary, enriched terms for diseases pointed at metabolic condition and abnormalities, comprising bilirubin, metabolite and vitamin levels. An overview of the most significant findings for diseases and human phenotypes is provided in Tables 3 and 4, respectively.
| Disease | Description | Genes in network | Strength | Signal | FDR |
|---|---|---|---|---|---|
| DOID:2741 | Bilirubin metabolic disorder | 10/13 | 2.12 | 3.51 | 1.57 × 10−13 |
| DOID:2739 | Gilbert syndrome | 7/8 | 2.17 | 2.4 | 2.79 × 10−9 |
| DOID:2382 | Kernicterus | 6/6 | 2.23 | 2.08 | 4.47 × 10−8 |
| DOID:3803 | Crigler-Najjar syndrome | 6/6 | 2.23 | 2.08 | 4.47 × 10−8 |
| DOID:0014667 | Disease of metabolism | 28/1076 | 0.65 | 1.07 | 4.47 × 10−8 |
| Phenotype | Description | Genes in network | Strength | Signal | FDR |
|---|---|---|---|---|---|
| EFO:0005664 | Blood metabolite measurement | 19/181 | 1.12 | 2.64 | 7.53 × 10−14 |
| EFO:0005653 | Serum metabolite measurement | 22/287 | 1.11 | 2.37 | 7.53 × 10−14 |
| EFO:0004739 | Circulating cell free DNA measurement | 9/9 | 2.23 | 3.33 | 1.11 × 10−12 |
| EFO:0010551 | Xanthurenate measurement | 9/11 | 2.14 | 3.19 | 2.86 × 10−12 |
| EFO:0004725 | Metabolite measurement | 33/1033 | 0.73 | 1.47 | 2.86 × 10−12 |
Finally, enrichment in human tissues was also examined. Significant signals were observed in the visceral body cavity (BTO:0001491, FDR = 2.52 × 10−20), in organs related to ADME as the liver (BTO:0000759, FDR = 2.47 × 10−17), the gastrointestinal tract (BTO:0000511, FDR = 4.25 × 10−5), and the kidneys (BTO:0000671, FDR = 1.35 × 10−5), and in several glands, such as digestive, endocrine, and excretory (BTO:0000522, FDR = 2.22 × 10−14; BTO:0000345, FDR = 1.88 × 10−15; BTO:0001488, FDR = 3.92 × 10−7; BTO:0000431, FDR = 1.12 × 10−5). Globally, organs and tissues mostly involved in (poly)phenol ADME were enriched. The most significant results are reported in Table 5.
| Tissue | Description | Genes in network | Strength | Signal | FDR |
|---|---|---|---|---|---|
| BTO:0001491 | Viscus | 84/5378 | 0.42 | 1.05 | 2.52 × 10−20 |
| BTO:0000759 | Liver | 52/2125 | 0.62 | 1.42 | 2.47 × 10−17 |
| BTO:0000345 | Digestive gland | 57/2881 | 0.53 | 1.18 | 1.88 × 10−15 |
| BTO:0000522 | Gland | 86/7004 | 0.32 | 0.8 | 2.22 × 10−14 |
| BTO:0001488 | Endocrine gland | 70/6403 | 0.27 | 0.61 | 3.92 × 10−7 |
Network nodes were clustered using the k-means clustering algorithm. To define the most appropriate number of clusters, an iterative approach was applied, starting from a minimum of 2 clusters and avoiding clusters comprising only one gene. The 2-means clustering identified two clusters, one of 109 genes and one of 2 genes, with 5 genes not belonging to any cluster. Increasing the number of clusters, the 9-means clustering reported for the first time a cluster comprising only one gene. Therefore, the 8-means clustering was considered the most informative. The 8-cluster model comprised 57, 19, 14, 7, 5, 4, 3, and 2 genes. The smallest cluster was consistent across all the iterations. Moreover, it was possible to summarise each cluster but cluster 8 using a representative keywords: Cluster 1 was related to “Cellular glucuronidation”; Cluster 2 to “Defective GALNT3 causes HFTC”; Cluster 3 to “Xenobiotic transport” and “Recycling of bile acids and salt”; Cluster 4 to “Positive regulation of nitric-oxide synthase biosynthesis”; Cluster 5 to “Sulfation” and “Aryl sulfotransferase activity”; Cluster 6 to “Carbohydrate digestion and absorption”; and Cluster 7 “Proton-coupled monocarboxylate transport” and “Lactate transmembrane transporter activity” (Fig. 3). The clustered interaction network is reported in Fig. 4 and Table S2.
241 human coding and non-coding background genes.
Gene expression analysis was performed on both general and specific tissue types. Significant results (Bonferroni corrected p-value < 0.05) were observed in up-regulated differentially expressed gene sets and no evidence of association resulted in down-regulated differentially expressed gene sets. As a consequence, both-sided differentially expressed gene sets turned out to be significant. Among the general tissue types (n = 30), liver (p = 1.34 × 10−25), small intestine (p = 3.46 × 10−13), kidneys (p = 8.97 × 10−13), and colon (p = 4.06 × 10−6) showed the most significant results. This was reflected in the specific tissue types (n = 54), where the liver (p = 2.41 × 10−25), the small intestine terminal ileum (p = 6.23 × 10−13), the kidney cortex (p = 1.11 × 10−12), and the colon transverse (p = 1.70 × 10−11) were significant. In summary, differential expression of these genes in tissues strictly related to (poly)phenol metabolism was significant. A complete overview is reported in Fig. 4.
An additional round of gene set enrichment analysis was performed. Significantly enriched terms from the Gene Ontology, KEGG, Reactome, and Wikipathways were concordant with previous findings from STRING. Further, functional analyses were performed on the gene sets from the Molecular Signature Database. Among the hallmark gene sets, significant results were observed in 3 gene sets related to metabolism, being xenobiotics (p = 2.70 × 10−5), fatty acids (p = 3.20 × 10−3), and peroxisome (p = 1.68 × 10−2). Additionally, the inflammatory response was found to be significant (p = 3.84 × 10−2). Enrichment was also observed in 4 positional gene sets on cytogenetic bands located on chromosomes 4 (chr4q13, p = 7.82 × 10−14), 2 (chr2q37, p = 1.04 × 10−4), 7 (chr7q22, p = 8.11 × 10−3), and 1 (chr1q24, p = 4.21 × 10−2). Overall, gene sets already computed for different dietary components were found to be significant together with specific genomic locations in human chromosomes. A complete overview is reported in Table 6.
| Gene set | Number of genes | P-Value | Adjusted P-value | Genes |
|---|---|---|---|---|
| Xenobiotic metabolism | 9/196 | 5.41 × 10−7 | 2.70 × 10−5 | FMO3, FMO1, CYP2C18, ABCC2, CYP2E1, CYP1A1, CYP1A2, ABCC3, COMT |
| Peroxisome | 5/99 | 1.28 × 10−4 | 3.20 × 10−3 | ABCD3, ABCC5, UT2B17, ABCB1, SULT2B1 |
| Fatty acid metabolism | 5/155 | 1.01 × 10−3 | 1.68 × 10−2 | FMO1, SLC22A5, LDHA, CYP1A1, PDHA1 |
| Inflammatory response | 5/200 | 3.07 × 10−3 | 3.84 × 10−2 | TLR2, AHR, HIF1A, NOD2, CCL2 |
| chr4q13 | 13/85 | 2.61 × 10−16 | 7.82 × 10−14 | UGT2B17, UGT2B15, UGT2B10, UGT2A3, UGT2B7, UGT2B11, UGT2B28, UGT2B4, UGT2A1, UGT2A2, UGT2A1, SULT1B1, SULT1E1, MUC7 |
| chr2q37 | 8/148 | 6.93 × 10−7 | 1.04 × 10–4 | UGT1A8, UGT1A10, UGT1A9, UGT1A7, UGT1A6, UGT1A5, UGT1A4, UGT1A3, UGT1A1 |
| chr7q22 | 6/145 | 8.11 × 10−5 | 8.11 × 10−3 | CYP3A5, CYP3A7, CYP3A4, CYP3A43, MUC12, MUC17, RN7SKP54 |
| chr1q24 | 4/77 | 5.61 × 10−4 | 4.21 × 10−2 | FMO3, FMO2, FMO1, FMO4 |
Pathways from previously unexplored sources such as Biocarta were investigated. A total of 5 significantly enriched terms were observed in lipid metabolism (M16393, p = 9.53 × 10−10), in hypoxia in the cardiovascular system (M13324, p = 4.86 × 10−3; M5202, p = 8.96 × 10−3), and in the activity and resistance to drugs (M22078, p = 3.31 × 10−4; M22024, p = 1.38 × 10−2).
Cancer-related gene sets were also explored. Enrichment was observed for 17 gene sets. Among the tested sets, cytochrome P450 (M5645, p = 5.46 × 10−14) and heme biosynthesis (M2738, p = 2.11 × 10−5), oxidoreductase activity (M10134, p = 9.29 × 10−9), and transport of metabolites, such as carboxylic and organic acids (M13595, p = 6.68 × 10−8), and of peptide and amino acids (M12507, p = 6.21 × 10−5), were enriched. Moreover, xenobiotic metabolism, either general or in specific organs, such as the liver, resulted to be significant (M57, p = 1.69 × 10−2; M1872, p = 2.11 × 10−5; M14358, p = 2.76 × 10−4). It can be highlighted that several biological functions and enzymatic activities associated with phenolic metabolites and the tissues where are happening were significant. A complete overview is provided in Table 7.
| Systematic name | Gene set (module) | Number of genes | P-Value | Adjusted P-value | Genes |
|---|---|---|---|---|---|
| M5646 | M_106 | 8/12 | 1.27 × 10−16 | 5.46 × 10−14 | CYP1B1, CYP3A4, CYP2C18, CYP2C19, CYP2C9, CYP2C8, CYP2E1, CYP1A1 |
| M8592 | M_135 | 9/21 | 3.33 × 10−16 | 7.17 × 10−14 | FMO3, FMO4, CYP1B1, CYP3A4, CYP2C18, CYP2C19, CYP2C8, CYP2E1, CYP1A1 |
| M10134 | M_93 | 12/175 | 6.47 × 10−11 | 9.29 × 10−9 | AKR7A2, FMO5, FMO3, FMO4, CYP1B1, CYP3A5, CYP3A4, CYP2C19, CYP2C8, LDHA, CYP1A1, COMT |
| M13595 | M_71 | 6/21 | 6.20 × 10−10 | 3.84 × 10−7 | SLC16A4, SLC16A1, ABCC3, SLC16A5, SLC16A8, SLC16A2 |
| M2027 | M_212 | 13/311 | 4.46 × 10−9 | 3.84 × 10−7 | ABCD3, FMO3, FMO4, UGT2B15, UGT2B7, SULT1E1, CYP3A4, CYP2C8, LDHA, CYP1A1, ABCC1, ABCC3, COMT |
| M9415 | M_186 | 5/18 | 2.11 × 10−8 | 1.51 × 10−6 | SLC16A4, SLC16A1, ABCC3, SLC16A8, SLC16A2 |
| M2738 | M_505 | 4/13 | 3.78 × 10−7 | 2.11 × 10−5 | UGT2B15, UGT2B7, UGT2B4, SULT1E1 |
| M5974 | M_43 | 7/94 | 4.07 × 10−7 | 2.11 × 10−5 | FMO3, FMO4, CYP3A4, CYP2C19, CYP2C8, LDHA, CYP1A1 |
| M1872 | M_23 | 14/542 | 4.42 × 10−7 | 2.11 × 10−5 | FMO5, MUC1, FMO3, FMO4, UGT2B15, UGT2B7, UGT2B4, CYP3A4, CYP2C19, CYP2C9, CYP2C8, CYP2E1, CCL2, ABCC3 |
| M586 | M_227 | 4/15 | 7.15 × 10−7 | 3.08 × 10−5 | UGT1A6, UGT2B15, UGT2B7, UGT2B4 |
| M12507 | M_368 | 4/18 | 1.59 × 10−6 | 6.21 × 10−5 | SLC16A4, SLC16A6, SLC16A8, SLC16A2 |
| M17558 | M_55 | 16/800 | 1.87 × 10−6 | 6.63 × 10−5 | SLC16A4, FMO5, MUC1, FMO3, FMO4, ITGA6, UGT2B15, UGT2B7, UGT2B4, CYP2C19, CYP2C8, CYP2E1, MUC5B, SLC22A6, CCL2, ABCC3 |
| M16754 | M_218 | 4/19 | 2.00 × 10−6 | 6.63 × 10−5 | SLC16A4, SLC16A7, SLC16A8, SLC16A2 |
| M14358 | M_88 | 15/802 | 8.98 × 10−6 | 2.76 × 10−4 | SLC16A4, FMO5, MUC1, FMO3, FMO4, ITGA6, UGT2B15, UGT2B7, CYP2C19, CYP2C8, CYP2E1, MUC5B, SLC22A6, CCL2, ABCC3 |
| M8950 | M_117 | 13/697 | 3.81 × 10−5 | 1.10 × 10−3 | SULT1C2, UGT1A4, SLC2A2, UGT2B17, UGT2B7, UGT2B4, UGT2A1, UGT2A1, SULT1B1, UGT8, CYP3A4, CYP2C8, CYP1A1, SLC16A2 |
| M10538 | M_162 | 3/19 | 1.05 × 10−4 | 2.82 × 10−3 | SLC16A4, SLC16A7, SLC16A2 |
| M57 | M_247 | 3/35 | 6.68 × 10−4 | 1.69 × 10−2 | FMO4, SULT1C2, UGT2B15 |
Different biological signatures were also explored. Oncogenic signatures showed enrichment in cellular pathways dysregulated in cancer and associated with NFE2L2 (M2870, p = 3.20 × 10−5) and KRAS genes (M2880, p = 3.32 × 10−2).
Moreover, cell type signatures resulting from single-cell studies in different human tissues proved to be significant. Among them, 34 terms were enriched, particularly in intestinal (M40028, p = 1.11 × 10−14), hepatic (M39115, p = 3.37 × 10−9), and pancreatic (6.44 × 10−5) cell types.
Potential targets of regulation were also investigated, considering both microRNAs (miRNAs) and transcription factor (TF) targets. Six microRNAs (hsa-miR-6736-3p, p = 2.55 × 10−5; hsa-miR-6507-5p, p = 7.93 × 10−5; hsa-miR-4786-3p, p = 7.93 × 10−5; hsa-miR-185-3p, p = 4.28 × 10−4; hsa-miR-6839-3p, p = 4.41 × 10−4; hsa-miR-4496, p = 5.90 × 10−4) and one TF target (M133 RYTAAWNNNTGAY, p = 1.27 × 10−2) were significantly enriched, mostly driven by the UGT1A complex together with additional genes, as reported in Table 8.
| Gene set | Number of genes | P-Value | Adjusted P-value | Genes |
|---|---|---|---|---|
| MIR6736_3P | 9/123 | 9.82 × 10−9 | 2.55 × 10−5 | UGT1A8, UGT1A10, UGT1A9, UGT1A7, UGT1A6, UGT1A5, UGT1A4, UGT1A3, UGT1A1, LDHA |
| MIR6507_3P | 12/329 | 8.07 × 10−8 | 7.93 × 10−5 | UGT1A8, UGT1A10, UGT1A9, UGT1A7, UGT1A6, UGT1A5, UGT1A4, UGT1A3, UGT1A1, ABCG2, AHR, HIF1A, CYP1A2 |
| MIR4786_3P | 9/159 | 9.16 × 10−8 | 7.93 × 10−5 | UGT1A8, UGT1A10, UGT1A9, UGT1A7, UGT1A6, UGT1A5, UGT1A4, UGT1A3, UGT1A1, UGT2B17 |
| MIR185_3P | 8/147 | 6.59 × 10−7 | 4.28 × 10−4 | UGT1A8, UGT1A10, UGT1A9, UGT1A7, UGT1A6, UGT1A5, UGT1A4, UGT1A3, UGT1A1 |
| MIR6839_3P | 8/152 | 8.49 × 10−7 | 4.41 × 10−4 | UGT1A8, UGT1A10, UGT1A9, UGT1A7, UGT1A6, UGT1A5, UGT1A4, UGT1A3, UGT1A1 |
| MIR4496 | 10/283 | 1.36 × 10−6 | 5.90 × 10−4 | UGT1A8, UGT1A10, UGT1A9, UGT1A7, UGT1A6, UGT1A5, UGT1A4, UGT1A3, UGT1A1, AHR, CDKN1B |
| RYTAAWNNNTGAY_UNKOWN | 5/60 | 1.14 × 10−5 | 1.27 × 10−2 | UGT1A1, SLC22A6, SLC22A8, SLC22A12, SULF2 |
The GWAS Catalog was examined to retrieve associated phenotypes and diseases, resulting in a total of 42 enriched terms. Among them, different metabolites were significant, either general serum (p = 1.00 × 10−21) and urinary (p = 5.67 × 10−10) metabolites or specific compounds, such as bilirubin (p = 5.40 × 10−12) and low-density lipoprotein cholesterol (p = 8.18 × 10−5) levels. Of note, terms related to bilirubin were particularly enriched in terms of both significance and quantity. Additionally, some diseases, such as liver disease (p = 6.24 × 10−14) and gout (p = 7.44 × 10−4), as well as some drugs, such as acenocoumarol (p = 2.09 × 10−2), were enriched. Significant dietary habits were also identified in terms of coffee (p = 2.61 × 10−3), tea (p = 2.16 × 10−4), and bitter beverages (p = 2.69 × 10−2) consumption, with caffeine metabolism being also significant (p = 3.80 × 10−2). Significant genetic associations previously reported comprised several urinary and plasma metabolites, different disease impacting metabolism and its tissues, and the consumption of (poly)phenol-rich foods. A representation of these results is reported in Fig. 5.
All the results from the enrichment analysis are reported in Table S3.
Our analyses confirmed the central role of these genes in biological processes strictly related to (poly)phenol metabolism and bioavailability. Numerous sources showed strong significant associations with different metabolites, like xenobiotics, xanthurenate, and organic acids, and related processes, spanning from transport to general metabolism, highlighting the crucial role of the selected genes. This was further supported by specific molecular functions characterising (poly)phenol metabolism, such as transport, glucuronidation and sulfation,4,7 key processes in phase II metabolic reactions, and were also observed for specific phenolic-like drugs, like aspirin and paracetamol. Additionally, significant terms were also observed in the consumption of specific (poly)phenol-rich foods like tea and coffee. Supporting evidence was also identified at the tissue level, where enrichment and differential expression patterns were reported in the small intestine, the liver and the colon, core biological sites of (poly)phenol bioavailability and metabolism. The relevance of these genes was reinforced by the unsupervised k-means clustering, that confirmed their intricate biological relationships and highlighted the main biological processes and molecular functions4,7 in (poly)phenol metabolism and bioavailability: these terms were the key representative functions (i.e., the labels) of the eight clusters. Furthermore, this approach provided refined insights into the contribution of specific gene families. Some clusters were mostly identified by specific gene families, as in the case of mucins (MUCs); other clusters comprised complementary gene families, like the transport role carried out by ABC transporters (ABCs) and solute carriers (SLCs), and depicted the multifaceted interplay between biologically distant genes, such as UDP-glucuronosyltransferases (UGTs), cytochromes P (CYPs), sulfatase (SULFs) and sulfotransferases (SULTs). This high level of interconnection translates into the impossibility of defining clear-cut clusters, demonstrating and supporting the strong interplay and biological complexity among these genes and their functions.
Beyond consolidating known associations, our findings revealed novel mechanistic insights and previously unreported associations into (poly)phenol metabolism. Among the 121 curated genes that were investigated, the gene network and k-means clustering analyses reported the lack of connections of these 5 genes, comprising UGT8, SULF1, SLC16A6, ABCD3, and Aldo-Keto Reductase Family 7 Member A2 (AKR7A2). These genes are involved in key metabolic processes, such as glucuronidation (UGT8), sulfation (SULF1), and transport (ABCD3 and SLC16A6), with AKR7A2 mostly involved in detoxification. Nevertheless, these genes belong to large and biologically relevant gene families, suggesting that their apparent isolation may reflect under-representation and slow curation of databases rather than a lack of functional relevance, hence opening to new research studies aiming at further confirming their involvement and contribution to (poly)phenol bioavailability.
Significant and enriched gene sets not only provided already known associations, but also suggested novel mechanisms and biological players. Among the metabolites, the most represented significant associations were observed in bilirubin and in bile acids. The preponderant enrichment of bilirubin may suggest possible unreported evidence of bilirubin-mediated disorders. An example is Gilbert's syndrome, a benign liver condition caused by a variant in the UGT1A1 promoter gene and associated with high bilirubin levels and low risk of cardiovascular disease.39–41 Other bilirubin-dependent associations like kernicterus and Crigler-Najjar syndrome were found to be associated, more likely due to their relationship with reported traits rather than to (poly)phenol metabolism and effects. In general, these findings suggest that variability in circulating phenolic concentrations may modulate the penetrance of metabolic syndrome, thereby prompting consideration of how differential phenolic exposure may contributes the development of phenotypically related syndromes. Moreover, bile acids and salts were among the most represented metabolites, with regards to their recycling process. Notably, their significance is important since many (poly)phenols may be subjected to extended enterohepatic recirculation, hence highlighting novel mechanisms worth to be investigated.4,7 Furthermore, associations of metabolites comprised also different drugs having phenolic-like motifs (benzene), like aspirin, paracetamol, morphine, tamoxifen, and codeine. These drugs were highlighted for their interactions with genes from UGT, SULT, MUC and CYP families, key contributors to core functions of (poly)phenol ADME, suggesting these genes as possible candidate pharmacogenes.42,43 Hence, these results suggest novel and previously unreported mechanisms, paving the way for individualised pharmacological approaches mediated by food and nutrients, contributing to better understand the bioavailability of dietary secondary compounds and other xenobiotics.
Resulting associations with human diseases reinforced the role of (poly)phenols and their interaction with the human genome in modulating health. In relation to cancer, we identified that NFE2L2, normally involved in absorption and metabolism, was enriched with oncogenic signatures. Additionally, some genes belonging to the UGT2 and CYP2C families seemed to drive associations with the KRAS oncogene; these findings may suggest a pleiotropic effects between metabolism and oncogenic signalling, opening to possible investigation on the therapeutic effects of (poly)phenols in cancer, as recently observed in breast and colorectal cancers.44,45 In parallel, several enriched terms supported the relationship between (poly)phenols and cardiometabolic health, identifying associations with genes involved in inflammation and inflammatory response, like TLR2 and CCL2,46 and acting on risk factors like low-density lipoprotein, as reported by genes belonging to the UGT1A family.47,48 These findings implicated the role of (poly)phenols as mediator of cardiometabolic risk, as investigated by ongoing research studies,17 suggesting their modulatory effects in systemic metabolic and inflammatory conditions.
Finally, these novel findings proposed genetic regulation as a key mechanism shaping the biological effects of (poly)phenols. Different positional gene sets resulted to be significant, particularly in four chromosomal locations on chromosomes 1, 2, 4, and 7. Of note, these genomic regions where characterised by the presence of several members of key gene families together with apparently unrelated genes: chromosome 4 contained UGT2A and UGT2B families, and SULT1B1, SULT1E1 and MUC7 genes; chromosome 2 contained the UGT1A family; chromosome 7 contained members of the CYP3A family together with MUC12 and MUC17 genes; and chromosome 1 contained members of the FMO family. Despite the genomic position may be uninformative and only provide information on the location of the input genes, the enrichment of these gene sets suggests hints on unexplored biological features and additive effects, such as gene regulation, dosage compensation, epigenetic silencing and other regional effects.49 Furthermore, the possible uncovered regulatory mechanisms were further supported by the enrichment for miRNAs and TF targets, highlighting the importance of regional effects and proposing novel candidates worth investigating.50
Nevertheless, this study presents several limitations that needs to be considered to contextualise its findings. Despite the curation of the 121 gene and the proposed reference, functional studies are needed to validate the role of these genes and their contribution to (poly)phenol metabolism and bioavailability. The limited resources regarding (poly)phenols on public available databases for human genomics reduced the power and resolution of this study, likely not fully exploiting the biological connection and relevance of these genes and their associations with human traits and diseases. In addition, the generalisation of our findings remains limited, emphasising the need for exploratory investigations and functional validation studies to support or denied current findings.
Looking forward, this study and the proposed reference favours for targeted applications in the context of human diseases and traits. The significant enrichment for bilirubin makes Gilbert's syndrome a promising case study. The role of UGT1A1 in the onset of this benign syndrome and its association with low glucuronidation ability is well recognised. However, the possible role and implication of other metabolic processes have been poorly investigated: evidence comprising sulfation or other function are almost absent, making them worth to be investigated. At the same time, our investigation suggested the importance of genes related to (poly)phenol metabolism at the regulatory level. Functional validation will be essential to move beyond static genetic associations, expanding our current knowledge on the biological determinants shaping the inter-individual response to (poly)phenols and gaining a complete mechanistic understanding. Finally, the proposed reference implies its application to health. The observed associations and enrichment with numerous traits and diseases, spanning from cancer to complex disorders, underline the needs to explore the intersection between nutrition, metabolism, genetics and disease risk and progression, opening the way for precision nutrition approaches targeting a wide range of disorders. Overall, we might argue that these genes not only play a role in (poly)phenol ADME but also in the health status of the individual, highlighting the need for dedicated genomics and multi-omics investigations able to confirm previous associations and validate novel hypotheses. Genetic polymorphisms in genes interacting with or related to (poly)phenol ADME and their impact on health status may be key for future precision nutrition approaches of these dietary bioactive compounds.13,19
| This journal is © The Royal Society of Chemistry 2025 |