Long non-coding RNA expression profiles predict clinical phenotypes of seminoma and yolk sac tumor

The First Affiliated Hospital of Zhengzhou E-mail: llzdyfy@163.com; Tel: +86-731-6778 Department of Occupational and Environm Zhengzhou University, Zhengzhou, China Center for Reproductive Medicine, Shan Shandong University, 157 Jingliu Road, Jina Key Laboratory of Reproductive Endocrin Education, National Research Center for Reproductive Genetics, 157 Jingliu Road, Jin † Electronic supplementary informa 10.1039/c7ra12131h Cite this: RSC Adv., 2017, 7, 56271


Introduction
In recent years, cancer is emerging as a major public health problem worldwide. According to the statistical data released by the American Cancer Society and the National Cancer Center of China, cancer has become the second leading cause of death in both countries. 1,2 Over the past decades, the number of patients dying from cancer has substantially dropped, whereas cancer death numbers related to the brain and central nervous system (CNS) are increasing, for now, they have surpassed leukemia becoming the leading cause of cancer deaths in children and adolescents (aged birth to 19 years old). 1 Moreover, the 5 year survival rate (%) of brain and CNS cancers has risen from 57% to 74% among patients who were less than 20 years old, whereas the overall survival rate regardless of patient age was still under 35% and showed a decreasing trend. 3 Germ cell tumors (GCTs), one phenotype of the brain and CNS cancer, accounted for 11.8% of pediatric tumors in China, 4 and the malignant GCTs accounted for 2.9% of all malignant tumors in children who were younger than 15 years old worldwide. 5 In general, GCTs are characterized by a high heterogeneity of their histological differentiation, but they show a similar histological pattern independent of their primary site or sex. 6 As indicated by Teilum in 1965, the neoplastic cell that derived from gonadal or extragonadal germ cell was able to trans-differentiate into embryonal and exo-embryonal malignant carcinoma. 7 The former includes mature/immature teratoma in embryo and choriocarcinoma (CHC) and yolk sac tumor (yolk sac tumor) outside the embryo. Meanwhile, the exoembryonal carcinoma such as seminoma (testis), dysgerminoma (ovary) and germinoma (brain) are all malignant tumors (Fig. 1).
In recent years, biomarkers including al-fetoprotein (AFP) and human choriogonadotropin (HCG) have been used for diagnosis of yolk sac tumor and CHC, and a moderate elevation of b-HCG was considered to occur in seminoma. Despite great progress achieved in the early diagnosis and distinguishment of different clinical phenotypes of GCTs, a great amount of misdiagnosis still occurred every year. For example, the reference value of HCG used to diagnosis seminoma/germinoma (<50 IU L À1 ) was similar to syncytiotrophoblast-like giant cells. 8 Additionally, in neonates and young infants, the AFP was born with a physiologically elevated level, but children older than two years old with a high AFP level ($100 mg L À1 ) can be considered as malignant GCTs. 9 Nevertheless, in some liver diseases such as acute liver failure, hepatocellular carcinoma, and hepatoblastoma, the APF secretion is also elevated due to hepatocellular regeneration. 10 Thus, it may lead to incorrect judgment to make clinical decisions only depending on images or molecular biomarkers, and it would be of great signicance to nd out more stable and accurate biomarkers that were used to diagnosis and distinguish different clinical phenotypes of GCTs regardless of patient age and the disturbance from other diseases.
The emerging role of long non-coding RNA (lncRNA) as promising biomarker and critical therapeutic target has drawn considerable attentions. However, the role of lncRNA in GCTs has not been investigated. Typically, lncRNAs are non-protein coding transcripts longer than 200 nucleotides which were involved in numerous critical biological processes such as X chromosome silencing, genomic imprinting, chromosome modication, transcriptional activation, transcriptional interference, and nuclear transport. 11 HOTAIR, for example, a wellstudied lncRNA, was found aberrantly expressed in different subtypes of breast cancer, which highlighted the role of lncRNA in distinguishing breast cancer from different subtypes for the rst time. 12 In glioma and colorectal cancer, lncRNAs such as HOXA-AS, MALAT1, and NEAT1 were all found to be specically distributed. 13,14 Hence it may be a new way to distinguish malignant GCTs from embryonal to exo-embryonal via lncRNA proling. Favorably, microarray datasets shared by previous studies can be achieved from the Gene Expression Omnibus (GEO) and used to investigate our hypothesis.
Herein, we aimed at proling lncRNA expression signatures in embryonal malignant carcinoma (yolk sac tumor) and exoembryonal malignant carcinoma (seminoma) by analyzing a cohort of previously published microarray datasets that achieved from the GEO. The distinctive lncRNAs were identied through comparison between groups of different age and GCTs phenotypes respectively. Our ndings provide novel information on lncRNA expression proles that may help to distinguish GCTs from different phenotypes regardless of the limitation of age and disturbance from other diseases, and the results also provided potential diagnostic biomarkers and therapeutic targets for yolk sac tumor and seminoma.

GEO seminoma and yolk sac tumor expression data
All experiments were performed in compliance with the guidelines approved by the ethics committee at the Memorial Sloan-Kettering Cancer Center (New York, NY) between 1987 and 1999. Informed consents were obtained from human participants of this study. The microarray datasets of seminoma and yolk sac tumor related to children and adult were obtained from the GEO. To compare the lncRNA expression signatures according to patient of different age and GCT phenotypes, two panels of adult and pediatric GCT gene expression datasets were included in this study: GSE3218 and GSE10615. The raw les of these two datasets which were based on the platform of Affymetrix Human Genome U133A Array were downloaded from the GEO, the data quality control process including quartile normalization, background adjustment, and summarization was processed using the Robust Multichip Average soware (RMA, 1.2.0 In Development), which has been proved to be more efficient in estimating lncRNA expression fold changes than other soware. Also, samples with a median expression value that exceeded the control limit line in plots of normalized unscaled standard error (NUSE) and relative log expression (RLE) were excluded from the downstream analysis. With this, a set of probe ID-centric gene expression values was obtained.

lncRNA classication pipeline
To evaluate lncRNA expressions in the microarray datasets that were obtained from the above step, we adopted the lncRNA classication pipeline which had been previously described to identify lncRNAs represented on the Affymetrix Genome array. 15 In brief, we rst mapped the ID-centric gene expression matrix to the NetAffx Annotation File (HG-U133A Annotations, CSV format, Release 35, 7 MB, 10/7/2014), which was available on the Affymetrix official website (http://www.affymetrix.com). Next, we only retained probes that labeled as "NR_" in the column of RefSeq transcripts IDs. While in the Ensembl gene IDs column, we selected probes that labeled as "lincRNA," "processed_transcript," "macro_lncRNA" or "misc_RNA." Lastly, we ltered the extracted annotated lncRNAs to exclude pseudogenes, rRNAs, microRNAs or other short RNAs (tRNAs, snRNAs, and snoRNAs).

Differentially expressed lncRNAs screening
Gene-e soware was used to determine the differentially expressed lncRNAs between seminoma and yolk sac tumor in adult and children. Similarly, the distinctive lncRNAs between adult and children stratied by GCTs phenotypes (seminoma or yolk sac tumor) were also investigated. Conditions used to screen the differentially expressed lncRNAs were set as follows: false discovery rate (FDR) < 20%, fold change $ 2, permutation time 1000 and p-value < 0.01. The co-existed differentially expressed lncRNAs between seminoma and yolk sac tumor or adult and children were intersected using Venn diagram. In order to investigate the effectiveness of the pack mode, the principal component analysis (PCA) was adopted using MeV (available at http://mev.tm4.org).

Validation of differentially expressed lncRNAs
The Oncomine database which was hosted by Thermo Fisher Scientic Inc. provided more than 715 datasets and 86 733 samples with expertly curated data. Thus we took advantage of this database to validate the expression of lncRNAs shared by adults and children, the comparison mode was selected as cancer vs. normal aer uploading lncRNAs, and values of fold change, t test, and p statistics were recorded and used for further analysis.

Statistical analysis
All statistical analyses were processed using SAS version 9.2 for windows (SAS Institute Inc., Cary, North Carolina, USA). Differentially expressed lncRNAs were investigated using Genee soware, PCA analysis was adopted using MeV online version. The age of children and adult with normal distribution were shown as mean AE standard deviation (SD). A p value less than 0.05 was considered as statistically signicant unless otherwise specied.

Datasets characteristics
The gene expression data of pediatric and adult seminoma and yolk sac tumors were included in this study: GES10615 and GSE3218A. The gene chip GSE10615 contained 28 pediatric samples, among which 18 samples were malignant yolk sac tumors, while 10 samples were malignant seminomas, including 1 seminoma sample which was excluded aer quality  control ( Fig. 2 and 3). In parallel, a total number of 21 adult samples including 9 yolk sac tumors and 12 seminomas were included in GES3218A. Each sample involved in this study was purely one phenotype of tumor specied, not mixed with others. The age of the children represented by GSE10615 was 8.3 AE 5.6 years old, while the related indicator was not obtained in the adult group.
lncRNA expression proles on Affymetrix Human Genome U133A Array With the lncRNA classication pipeline, 398 probe sets corresponding to 368 lncRNA genes were identied. Of these, 49 probe sets (40 genes) were annotated as lncRNAs by both RefSeq and Ensembl database, 267 probe sets (216 genes) were annotated by RefSeq database, and 180 probe sets (192 genes) were annotated by Ensembl database. In addition, probe sets that were annotated by both databases but had controversial denitions were excluded from this study (Tables S1 and S2 †).

Distinctive lncRNA expressions between seminoma and yolk sac tumor
We compared the lncRNA expression patterns between seminoma and yolk sac tumor stratied by age. A total number of 13 probe sets corresponding to 11 lncRNA genes were identied in adults. Meanwhile, 13 probe sets corresponding to 12 genes were found aberrantly expressed in children, and 6 probe sets (5 genes) including XIST, C17orf86, FAM182A, FLJ11235, and C12orf47 were shared by children and adult. Moreover, 19 probe sets corresponding to 16 lncRNA genes were identied between seminoma and yolk sac tumor when the effect of age was excluded (Table 1). Interestingly, ve lncRNAs shared by adult and children were all involved in this cluster, and there were 10 differentially expressed lncRNAs that were overlapped in adult and pediatric lncRNA clusters. The PCA results of seminoma and yolk sac tumor were shown in Fig. 4, the cumulative component percentage reached up to 90% when the component number was set as 3. However, it was much higher (more than 90%) when the component number was set as 2 regardless of age, and dots with different colors of the same disease tightly clustered together.

Distinctive lncRNA expressions between children and adult
We also compared the lncRNA expression patterns between children and adult stratied by seminoma and yolk sac tumor. In seminoma, 16 probe sets corresponding to 14 lncRNAs were identied, while in yolk sac tumor, 17 probe sets corresponding to 16 lncRNAs were shown to be aberrantly expressed ( Table 2). A total number of 9 lncRNAs such as PART1, NCRNA00230A, POM121L9P, MEG3, TP53TG1, LOC157627, LOC339290, DKFZP434L187, and TTTY15 were shared by seminoma and yolk sac tumor ( Table 3). The PCA results of children and adult were shown in Fig. 5 with a validity $90%, and 2 components were clustered, dots of the same age group were clustered together.

Validation of differentially expressed lncRNAs
The expression levels of lncRNAs shared by yolk sac tumor and seminoma were investigated using Oncomine database and compared between yolk sac tumor and seminoma, yolk sac tumor or seminoma and normal respectively. Apart from lncRNA FAM182A and FLJ11235 which cannot be found in Oncomine database, the expressions of lncRNA XIST and C17orf86 were all of statistical signicance between cancer and normal except for C17orf86 in seminoma (Table 4), which were highly consistent with our assumptions.

Discussion
To date, different phenotypes of malignant GCTs were determined mainly depending on histological changes, but it may be misdiagnosed when facing patients of different ages or suffering from other diseases. As epidemiological and clinical evidence indicated, the histological changes of malignant GCTs among different clinical phenotypes were partly overlapped, 16,17 thus biomarkers that were currently used also led to misunderstandings under certain conditions. Therefore, the emerging role of lncRNAs as potential diagnostic biomarkers maybe shine insight on the antidiastole on GCTs phenotypes. In this study, two groups of patients (children and adult) which were stratied by two phenotypes of malignant GCTs (seminoma and yolk sac tumor) were involved. With the lncRNA classication pipeline, we rstly analyzed the differentially  expressed lncRNAs between seminoma and yolk sac tumor. A set of 11 aberrantly expressed lncRNAs were identied in adult and 12 lncRNAs were determined in children. Five distinctive lncRNAs including XIST, C17orf86, FAM182A, FLJ11235, and C12orf47 were involved in the intersection of children and adult, indicating a potential role of these lncRNAs in distinguishing seminoma from yolk sac tumor regardless of age. As reported previously, XIST expressions were widely detected in seminomatous testicular germ cell tumors, and the presence of the unmethylated XIST were frequent in testicular germ cell tumors. 18 To our best knowledge, the role of the other four lncRNAs in malignant GCTs has not been investigated. However, the lncRNA C17orf86 that was also known as SNHG20 was associated with the metastasis of hepatocellular carcinoma, and the elevated expression level of SNHG20 could promote carcinoma cellular invasion. 19,20 Besides, the function of the le three lncRNAs including FAM182A, FLJ11235, and C12orf47 has not been explored even in other diseases.  We also processed comparisons between children and adult, which were stratied by two phenotypes of GCTs (seminoma and yolk sac tumor). In total, 14 expressed lncRNAs were iden-tied in seminoma and 16 lncRNAs were determined in yolk sac tumor. Of these, 9 lncRNAs including PART1, NCRNA00230A, POM121L9P, MEG3, TP53TG1, LOC157627, LOC339290, DKFZP434L187, and TTTY15 were overlapped between seminoma and yolk sac tumor, which suggested that people with one or a certain number of the overlapped lncRNAs may be facing high risk in suffering from seminoma or yolk sac tumor. Coincident with previous studies, MEG3, which has been widely found in many cancers, also regulated the growth of testicular germ cell tumor through PTEN/PI3K/AKT pathway. 21 And the lncRNA POM121L9P was pointed out to be associated with male sterility via binding to Piwi proteins in mammalian. 22 Another lncRNA, PART1, is a novel human prostate-specic and androgen-regulated gene that loci in chromosome 5q12, the available studies have proved that the expression level of this lncRNA was elevated by approximate 73.1% detected using specimens of stage I-III non-small cell lung cancer. 23 Moreover, TP53TG1 was an important regulator of cellular homeostasis, which could undergo cancer-specic promoter hypermethylation-associated silencing and inhibit the occurrence and development of cancer. 24 Another lncRNA, TTTY15, was highly cited in prostate cancer, the fusion action mode of this gene with USP9Y was identied in a large cohort study of prostate cancer. [25][26][27] The function of the le lncRNAs is still lack of annotations.
To further explore identical biomarkers that were unaffected by age and GCTs phenotypes, we collaborated lncRNAs shared by groups of different ages or GCTs phenotypes, but none lncRNA fed back when we loaded these two datasets into the statistical soware to compare for the overlapped region. However, in the validation process conducted by investigating Oncomine database, we found lncRNA XIST was differentially expressed between cancer and normal, indicating a lncRNA screening criteria of high strict and accurate hold by this study.
In summary, ve differentially expressed lncRNAs shared by adult and child were identied in comparison between seminoma and yolk sac tumor, while nine lncRNAs shared by  seminoma and yolk sac tumor were determined in comparison between adult and child. The lncRNAs identied in this study may be of great potential in distinguishing GCTs of different phenotypes (seminoma and yolk sac tumor), and they can also be used as promising biomarkers in indicating risk levels from which patients of seminoma or yolk sac tumor may suffer regardless of age. Although some of the lncRNAs had been validated, the majority of them have not been investigated, further studies are still needed.

Conflicts of interest
There are no conicts of interests.