Xuexin
Yu‡
a,
Baofeng
Lian‡
abc,
Lihong
Wang
d,
Yan
Zhang
a,
Enyu
Dai
a,
Fanlin
Meng
a,
Dianming
Liu
a,
Shuyuan
Wang
a,
Xinyi
Liu
a,
Jing
Wang
a,
Xia
Li
*a and
Wei
Jiang
*a
aCollege of Bioinformatics Science and Technology, Harbin Medical University, China. E-mail: jiangwei@hrbmu.edu.cn; lixia@hrbmu.edu.cn
bSchool of Life Science and Biotechnology, Shanghai Jiao Tong University, China
cShanghai Center for Bioinformation Technology (SCBIT), China
dInstitute of Cancer Prevention and Treatment, Harbin Medical University, China
First published on 24th June 2014
Although several studies have investigated the essential roles of inflammation in tumor progression, not many have systematically analyzed gene expression patterns across diverse cancers in the context of inflammation. In this study, in order to better understand the inflammatory scenario, we initially constructed the inflammatory timeline (IT) based on two gene expression profiles during inflammatory progression (inflammatory bowel disease and Helicobacter pylori infection). Then, we separately identified the differentially expressed genes (DEGs) from 25 cancer-related microarray data. By comparing the distributions of DEGs in the IT, we identified three novel pan-cancer gene expression patterns. In the first pattern, the up-regulated genes in cancers were over-expressed in the early phase of inflammation, while the down-regulated genes were over-expressed in the late phase of inflammation. The second pattern was the opposite of the first one. The third pattern appeared to be transitional between the first and second patterns. We found that some cancers with different tissue origins have similar gene expression patterns. Finally, we identified two sets of tissue-independent inflammatory signatures that were over-expressed in early and late phases of inflammation, respectively. The dominant biological processes of early inflammatory signatures were cell proliferation, DNA replication, and DNA repair, whereas the late inflammatory signatures were reflective of innate immune response, neutrophil migration, and antigen processing. These inflammatory signatures may be useful to predict gene expression patterns in human cancers. Therefore, the pan-cancer analysis of gene expression patterns in the context of inflammation provides a novel insight into cancers and an unprecedented opportunity to develop new therapies.
Importantly, recent clinical studies have demonstrated the association between inflammation and cancer. For example, individuals with inflammatory bowel disease were found to have a 10-fold higher risk of developing colorectal cancer than those without. Through anti-inflammatory therapy,4,5 the incidence of colon cancer reduced greatly.6,7 Furthermore, inflammation caused by bacterial and viral infection also increases cancer risk. In the gastrointestinal tract, Helicobacter pylori infection is a leading cause of adenocarcinoma and mucosa-associated lymphoid tissue.8,9 In the hepatic system, carriers of hepatitis B and hepatitis C virus (HBV and HCV, respectively) were predisposed to hepatocellular carcinoma (HCC). Moreover, HCV-positive men have a 20-fold higher risk of developing HCC than HCV-negative subjects.10–12
Several transcription factors and inflammatory cytokines play important roles in cancer-related inflammation, such as nuclear factor-kappa B (NF-κB), tumor necrosis factor (TNF-α), and interleukin-6 (IL-6). However, current research has mainly focused on explaining the mechanism of only one of these types of inflammation-mediated carcinogenesis. Thus far, there has been no systematic analysis between various inflammations and cancers. In recent times, The Cancer Genome Atlas (TCGA) Pan-Cancer analysis project has analyzed the shared molecular features and relevant functional roles across cancers of disparate organs, which has helped clinicians to extrapolate therapy from one tumor type to others with a similar genomic profile.13,14 The Pan-Cancer project revealed that different tumors have several shared features such as somatic copy number alterations, mutations, and epigenomic alterations.14–16 Interestingly, Isaac et al. analyzed the links between ten distinct developmental processes and a series of human cancers. They classified all cancers de novo based on the gene expression signatures in the context of various developmental processes and found similar gene expression patterns in different tumor types, which depicted the tumor landscape in a very novel and comprehensive manner.17 With this background, we hope to describe the overall potential relationship between inflammations and tumors based on gene expression.
In the present study, in order to summarize the inflammatory landscape, we used principal component analysis (PCA) to construct inflammatory timelines (ITs) to evaluate the progression in inflammatory bowel disease (IBD) and H. pylori infection (Hp). As the species of IBD time course data was Mus musculus, we defined the human orthologs in the mouse through NCBI and got the equivalent gene symbol for each human/mouse ortholog pair. In addition, we identified three gene expression patterns with tissue-independent features and consequently obtained three corresponding cancer groups after generalizing the distribution of differentially expressed genes (DEGs) in the IT for all cancers. By comparing the functions of DEGs for each cancer group with that of two inflammatory signatures, we found that the functions of up-regulated DEGs for the first cancer group were similar to the function of the early-phase inflammatory signature, while the functions of up-regulated DEGs in the second group were similar to the function of the late-phase inflammatory signature. Thus, our study provides a novel insight into cancers in the context of inflammatory progression.
![]() | ||
Fig. 1 The number distribution of DEGs for all cancer data sets. The red bar shows the number of up-regulated genes and the blue bar describes the number of down-regulated genes. |
![]() | ||
Fig. 2 (A) Clustering result of probability distribution slope values. The heatmap represents the three cancer groups and the color of each cell describes the slope value of the regression line in a specific inflammatory context for each cancer. (B) Probability distributions and frequency plots for 3 representative cases of most tumors: (a) malignant pleural mesothelioma(b) renal cell carcinoma; (c) clear cell ovarian cancer. (C) Clustering result after inflammation-related gene subtraction. Abbreviations and colors are the same as in Fig. 2A. (D) Comparison of frequency plots after inflammation-related gene subtraction: (a) malignant pleural mesotheliomas(b) renal cell carcinoma; (c) clear cell ovarian cancer. |
Group 1 contained tumors with an “early” gene expression pattern, which meant up-regulated genes were preferentially expressed at the early IT, while down-regulated genes were activated at the late IT. For example, the frequency plot in Fig. 2B shows an early peak for up-regulated genes of malignant pleural mesothelioma, followed by a decline towards the late phase of the IT, which meant that these up-regulated genes were preferentially expressed in the early intestinal inflammation process. When compared to up-regulated genes, down-regulated genes presented an inverse pattern, which meant that down-regulated genes were preferentially activated in the late inflammatory stage. This observation was confirmed by the probability distribution. The slope of upE was larger than the slope of upL, while the slope of downL was larger than that of downE. A similar expression pattern was evident in Hp infection. This group encompassed 60% of all cancer datasets and contained tumors of varied tissues, including smoldering myeloma, oligodendroglioma, breast cancer, and hepatocellular carcinoma. Group 2 contained three independent data sets of renal carcinoma, two independent data sets of papillary thyroid and a T-cell lymphoma. The gene expression pattern of Group 2 was reverse of the Group 1 pattern, namely, “late” gene expression pattern. For instance, the up- and down-regulated genes of renal cell carcinoma were preferentially active in late and early ITs, respectively (Fig. 2B). In addition to the above two expression patterns, the tumors in Group 3, which included mucinous ovarian cancer, clear cell ovarian cancer, endometrioid ovarian cancer, and serous ovarian cancer, displayed an ambiguous relationship between cancer and inflammation. In the case of clear cell ovarian cancer (Fig. 2B), the ambiguity of the gene expression pattern was not satisfactorily explained by the inflammatory gradient, and hence, we surmised that it may be a transition pattern between Group 1 and 2 cancers.
Apparently, the identified gene expression patterns were tissue-independent, because each of them included many cancers with different tissue origin. The “early” gene expression pattern was a widespread feature across most of the cancers. In addition, by projecting the DEGs onto different ITs, we got similar frequency plots and probability distributions, which indicated that the pan-cancer gene expression patterns were stable in different inflammatory backgrounds.
BP | BP | ||
---|---|---|---|
Note: the significant BP terms for eIN450, lIN450 and the up- as well as down-regulated genes of group 1, 2 and 3 cancer data sets. For example, cell division is enriched in the up-regulated genes of 14 out of 15 data sets belonging to group 1. | |||
eIN450 | Cell cycle | lIN450 | Response to wounding |
Cell division | Inflammatory response | ||
Cell proliferation | Wound healing | ||
Nuclear division | Response to endogenous stimulus | ||
DNA repair | Response to hormone stimulus | ||
Chromatin modification | Cell adhesion | ||
Cell migration | |||
Group 1(15) | |||
Up | Cell cycle (15) | Down | Cell migration (15) |
Nuclear division (15) | Response to hormone stimulus (15) | ||
Cell division (14) | Response to endogenous stimulus (15) | ||
DNA replication (14) | Cell adhesion (14) | ||
Group 2(6) | |||
Up | Response to wounding (6) | Down | Regulation of cell cycle (2) |
Cell adhesion (6) | Positive regulation of transcription, DNA-dependent (2) | ||
Inflammatory response (5) | Cell cycle (1) | ||
Defense response (5) | |||
Group 3(4) | |||
Up | Response to endogenous stimulus (4) | Down | Positive regulation of transcription, DNA-dependent (2) |
Cell–cell adhesion (4) | Positive regulation of gene expression (2) | ||
Response to wounding (3) | Positive regulation of transcription, DNA-dependent (2) | ||
Positive regulation of cell communication (2) |
In order to explore the process of inflammation promoting cancer, we found some key genes from the inflammatory signatures, which should have both high frequency of occurrence in one cancer group and be located at the left end or the right end of the IT. Detailed information on frequency of occurrence and rank for inflammatory signature genes is provided in Table S2 (ESI†). For early inflammatory signatures, the key genes have high frequency of occurrence and low rank in ITs, and the functions of these key genes are correlated with DNA repair (MUDT1), DNA replication (TYMS, HMGB3), apoptotic death (BCL2), stem cell proliferation (H2AFX), and nuclear matrix gene (GENPF). Previous studies have shown that inflammation can enhance the tumor initiation and progression by producing growth factors and cytokines which could confer a stem cell-like ability upon tumor progenitors or stimulate stem cell expansion, thereby enlarging the cell pool.20,21 The up-regulated genes of Group 1 were preferentially expressed in the early phase of inflammation; most tumors within this group were much more aggressive than other group cases, such as glioblastoma, squamous cell lung carcinoma, and malignant pleural mesothelioma. As for the late inflammatory signatures, the key genes have high frequency of occurrence and high rank in Its; these genes were involved in neutrophil migration (HCK), secretory process (RAC2), TNF-receptor superfamily protein (FAS), T cell development (LCP2), and antigen processing (TAP1). Some researchers explained that the tumor microenvironment contains innate immune cells and adaptive immune cells after inflammatory responses, including neutrophils and T lymphocytes, respectively.22 These disparate cells could either communicate with each other directly or promote cytokine and chemokine production and act in an autocrine and/or paracrine manner to control and shape tumor growth.23 The up-regulated genes of Group 2 displayed a late gene expression pattern. The cancer forms in this group were much more indolent than those belonging to Group 1, e.g., renal cancer and thyroid cancer grow slowly. Group 3 contained four subtypes of ovarian cancer, and their gene expression pattern was a transition of the other two patterns. Tumors like renal cancer and ovarian cancer have poor outcome because they often metastasize, despite their slow growth rate.
Finally, these results suggested that the unique relationship between inflammation and tumors is a common feature across different cancer data sets.
Interestingly, we detected three pan-cancer gene expression patterns and two inflammatory signatures. As we selected two inflammatory time courses and 25 cancer cases, the results from these data reflected tissue-independence with respect to the above-mentioned patterns and signatures. Furthermore, from a functional perspective, the biological process enrichment analysis of inflammatory signatures and each cancer group genes explains the relationship of three distinct cancer groups and inflammation.
Our results suggest that there is potential for a deeper understanding of cancer through an inflammation-based perspective. In the present study, we solely analyzed gene expression data; we could integrate other types of data as well. Increasing evidence describes the link between inflammation and cancer from the standpoint of genomic instability;26 for example, the inflammatory cells and mediators can destabilize the cancer cell genome by a variety of mechanisms either directly inducing DNA damage or affecting DNA repair systems and altering cell cycle checkpoints, thereby resulting in acceleration of somatic evolution, promotion of cell proliferation, and invasion and evasion of host defenses. Owing to the limitation of inflammatory time course microarray data, we cannot further prove the robustness of the distribution patterns of cancer DEGs in different inflammations. Therefore, if we integrated other high-throughput genomic data and more inflammatory time course data in future analyses, such as exon-seq data, RNA-seq data, copy number variations data, and DNA methylation data, we could illustrate more clearly the molecular mechanism of cancer progression in the context of inflammation.
We used the median normalization method to normalize all data using the BRB-ArrayTools.28 As the species of GSE22307 was Mus musculus, we used the HomoloGene database of NCBI to define orthologs between human and mouse genome. Consequently, the intersection of genes across different platforms of inflammatory time series data and cancer data includes 4548 unique genes.
Finally, we got two ITs for two inflammatory time course data; in order to summarize the characteristics of two ITs and select some important genes from IT for subsequent analysis, we integrated them into one IT as follows: calculating the mean rank for each gene of two ITs and ordering the 4548 genes as per the mean rank values. The last gene axis was the integrated IT.
For each of the cancer data sets, the probability distribution P(IN[1, 2, …, 4548]|cancer) described the cumulative probability of up-regulated genes or down-regulated genes among the first i genes on the IT, which was calculated as follows:
Finally, for each combination of cancer and inflammation, this method produced four regression lines: two lines describing the early and late probabilities for up-regulated genes (upE and upL, respectively; Fig. 3B) and the other two lines for the down-regulated genes (downE and downL, respectively; Fig. 3B). For each cancer, we can summarize its relationship to the two ITs using an 8-dimensional vector (4 regression line slopes × 2ITs). These vectors were applied for clustering cancers.
The flow chart of algorithm is presented in Fig. 4.
Footnotes |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/c4mb00258j |
‡ These authors contributed equally to this work. |
This journal is © The Royal Society of Chemistry 2014 |