Exploration of metabolite signatures using high-throughput mass spectrometry coupled with multivariate data analysis †

Disease impacts important metabolic pathways and the alteration of metabolites may serve as a potential biomarker for early-stage diagnosis. High-resolution mass spectrometry-based metabolomics have been used to discover new biomarker metabolites. Rheumatoid arthritis (RA) seriously a ﬀ ects the quality of life in patients, but its pathophysiology remains unclear. This study aimed to develop a high-throughput approach by screening potential biomarkers to facilitate the diagnosis using metabolomics. The alteration of the metabolic pro ﬁ le of RA was investigated in human urine samples based on high-resolution UPLC-QTOF/MS and multivariate statistical analysis. Furthermore, ingenuity pathway analysis (IPA) was performed for the bioinformatics analysis of the data. Variable importance for projection values was determined, and the t -test was conducted for selecting a biomarker panel for RA. Receiver operating characteristic analysis was used to evaluate diagnostic accuracy of metabolites. We found that the score plot of orthogonal partial least squares discriminant analysis showed signi ﬁ cant discrimination between RA and healthy groups. Five metabolites were identi ﬁ ed as potential biomarkers for RA. The values of AUC, ranging from 0.819 to 0.993, indicated the potential capacity of these metabolites to distinguish RA patients and demonstrated that the di ﬀ erentially expressed metabolites might be a useful tool for the e ﬀ ective diagnosis of RA. The most signi ﬁ cantly altered networks included FXR/RXR activation and bile acid biosynthesis. This study demonstrates that a high-resolution mass spectrometry-based metabolomics approach could provide crucial insight into the pathogenesis mechanism of RA.


Introduction
Rheumatoid arthritis (RA) is a chronic autoimmune disease characterized by damage of the affected joints that causes pain, swelling, stiffness, and loss of function in joints. 1 Early diagnosis of RA is important for early intervention.Several biochemical biomarkers, including proteoglycan aggrecan, type I collagen, type II collagen, hyaluronic acid, osteocalcin, and others, have been identied to be associated with RA. [2][3][4][5][6][7][8] Presently, rheumatoid factor (RF), a well-known biomarker for RA, is not useful for the specic diagnosis of RA because it is also detected in various other rheumatic disorders. 9,10Therefore, more reliable biomarkers with diagnostic capabilities are still required for RA.Identication of the sensitive biomarkers of RA patients would reect both diagnosis and disease severity and provide an important tool for maximizing patient care.
Numerous omics studies have provided some biomarker candidates and an insight into the pathology of RA. 11 Urine is an easy, inexpensive, safe, and noninvasive uid and it has the potential to mirror health conditions. 12][15] Metabolomics is a non-targeted analysis of global changes of the complete set of metabolites in organisms and may be a powerful tool for discovering new biomarkers for diseases.0][21][22] It is possible to describe patterns of metabolite biomarkers that are highly discriminatory for the perturbations in diseases.The diagnostic information can be obtained by quantifying biomarker metabolites and comparing them to normal samples. 23The characteristic metabolite prole associated with a disease and the metabolic phenotype can be obtained using a metabolomics approach.Herein, results of a metabolomics study that monitored metabolic changes in the urine composition of RA patients compared to those observed in healthy controls have been presented.Ingenuity pathway analysis was used to elucidate the differentiated altered metabolites and associated pathways.In this study, metabolic phenotype features and regulated metabolite signatures of RA were successfully characterized using integrated highthroughput metabolomics coupled with multivariate data analysis and ingenuity pathway analysis.

Patient enrollment
The study was approved by the Ethics Committee of the Heilongjiang University of Chinese Medicine and written informed consent was obtained from all participants involved in this study.A total of 139 patients together with 124 normal cases were recruited.All subjects gave written informed consent.RA patients were required to be positive for RF and to have a RA disease duration of 6.5 years.The healthy control participants were recruited from hospital and university staff.These healthy individuals had no disease symptoms and did not use any medication.Detailed baseline and histopathologic characteristics for these patients are listed in the ESI Table 1.† Participants involved in this study did not take any medications, surgery, radiotherapy or chemotherapy, and those suffering from metabolic diseases, liver diseases, kidney diseases or any other cancers were excluded.Participants did not receive any drugs before one week of enrollment.Urine samples from RA subjects on week 1 of enrollment were selected for metabolomic proling.All the subjects were fasting before 10 p.m., and the samples were obtained in the morning between 8 and 9 a.m.All samples were processed in the same laboratory to avoid any bias.

Sample preparation
Sample preparation for the urine metabolomics analysis was conducted as follows: 400 mL of acetonitrile was added to 100 mL of urine, then vortexed for 1 min, and centrifuged for 10 min at a speed of 10 000 rpm at 4 C. Aer this, 400 mL of supernatant was transferred in a freeze-dryer.Finally, the dried supernatant was dissolved in 100 mL water/acetonitrile (4 : 1, v/v) solution at 4 C, and then the supernatant was ltered through a 0.22 mm membrane.Quality control (QC) was the mixture obtained from the equivalent volume in each sample.The urine was stored at À80 C until analysis.Urine samples were thawed at room temperature before analysis.

UPLC analysis
Urine samples were prepared for fast ultra UPLC-QTOF/MS (Waters corp., Milford, USA) analysis. 24Chromatographic separation was performed using a BEH C 8 column (2.1 Â 100 mm, 1.7 mm, Waters corp., Milford, USA) with a Waters LC system.The column temperature was maintained at 40 C and the injection temperature was 4 C.The analysis was performed using a fast ultra LC/TOF-MS with gradient elution using 0.1% formic acid in water as mobile phase A and 0.1% formic acid in acetonitrile as mobile phase B. Gradient elution was performed with the following solvent system: (A) 0.1% formic acid-water and (B) acetonitrile (ACN).The gradient was as follows: 1-10% A at 0-2.0 min, 10-30% A at 2.0-6.0 min, 30-50% A at 6.0-10.0min, 50-90% A at 10.0-11.0min, and 90% A at 11.0-13.0min.The ow rate was 2 mL min À1 and the injection volume was 3 mL.The technical variation of the analysis over time was monitored by a pooled QC sample.

High-resolution mass spectrometry
MS experiments were performed using a fast time of ight mass spectrometer (Waters corp., Milford, USA) equipped with an ESI source in the positive ion mode.The mass range was set at 50-1000 m/z in the full scan mode.The following parameters were used: capillary voltage, 3.0 kV; cone voltage, 25 V; collision energy, 5 eV; desolvation gas, 500 L h À1 ; cone gas, L h À1 ; desolvation temperature, 300 C; and source temperature, 110 C. Leucine-enkephalin was used as the lock mass.

Chemometrics analysis
The urine metabolome measurements resulted in a list of features aer preprocessing the data using Masslynx 4.1 (Waters corp., Milford, USA).The urine data were processed by EZinfo soware (V2, Waters corp., Milford, USA) for peak picking and peak alignment.The parameter of MZ width was set at 0.01 Da and RT width was set at 0.2 min.Principal component analysis (PCA) and orthogonal projection to least squares discriminant analysis (OPLS-DA) calculations were performed by EZinfo soware (V2, Waters corp., Milford, USA) using the autot function.Initially, PCA was rst used to reduce the dimensionality of the multidimensional dataset, while giving a comprehensive view of the clustering trend for the multidimensional data.OPLS-DA was then advanced for global and targeted proling from the converted spectral data.In addition, a quantitative estimation of the discriminatory power of each descriptor was evaluated using the VIP (variable importance for the projection) parameters provided by OPLS-DA.The VIP value was used to evaluate the variable contribution and nd out the potential biomarkers.Metabolites were identied by automated comparison of the ion features that included retention time, m/ z, adducts, and fragments.

Ingenuity pathways analysis
To explore the typical metabolic perturbations associated with the related metabolites, a pathway analysis using the IPA system (http://www.ingenuity.com/),which is a web-based soware application that identies biological pathways and functions relevant to biomolecules of interest, was performed.We uploaded the metabolite lists (with KEGG IDs) and the changes of the related metabolites onto an IPA server.Canonical pathways and molecular interaction networks were generated based on the IPA knowledge.

Statistical analyses
The t-test analyses of covariance were performed using SPSS soware (version 17.0; SPSS, Inc., Chicago, IL), with p < 0.05 deemed to be signicant.A further diagnostic property of important metabolites was deduced by receiver operating characteristic (ROC) curve analysis using MedCalc soware (Broekstraat, Mariakerke, Belgium).

Typical chromatograms
In this study, we used UPLC-QTOF/MS to collect the metabolite information of RA.Representative BPI of a urine sample is shown in Fig. 1A.The utilization of QC samples throughout the analytical run to monitor the data quality was a pragmatic solution for complex multi-component-based metabolomic research.As shown in Fig. 1B in the ESI, † the QC samples closely clustered together in the PCA score plots, which suggests that the analytical systems were stable and reliable.

Metabolic proles
Mean-centered MS data, normalized to total area, were subjected to unsupervised modeling by PCA that provided an overview of data and a means of detecting outliers.Thorough investigation of subsequent PC's revealed grouping according to the presence of RA.Scores plots showed separation between these two groups, highlighting the presence of variations related to the disease.To specify the metabolic variations related with the disease, an OPLS-DA model, which is a standard and well accepted technique used in the metabolomics eld to discover characteristic differences between RA and healthy groups, was built for the urine data set to nd the most discriminating features between the RA and healthy groups.OPLS-DA was applied in an effort to extract systematic parts of information discriminating patients and controls.The LC-MS data were subjected to PLS-DA.As shown in the score plots (Fig. 1C), the RA and healthy groups could be separated into two distinct clusters, which indicated that the state of metabolism was different between the RA and healthy groups.Fig. 1C shows that the separations of the two groups can be achieved with the model parameters R 2 Y ¼ 0.91 and Q 2 ¼ 0.83 for LC-MS data.R 2 Y and Q 2 show the explanative ability and predictive ability of the model, respectively.

Differential metabolites
Identication of potential biomarker candidates that account for the differentiation of diseases is a necessary step not only for diagnosis, but also for better understanding of the functional metabolism in clinical diseases.To screen putative biomarkers for RA, variable importance for projection (VIP) values from the OPLS-DA model were obtained.VIP values were used to rank the contribution of metabolites to discriminate between the RA and healthy groups, which are based on weighted coefficients of the OPLS-DA model (Fig. 1D).VIP analysis showed the order of metabolites according to their critical inuence on clustering.The key metabolites, which were essential for distinguishing between the RA and healthy groups, were selected from the results of the important metabolites (VIP > 15).Then, the Student's t-test (p-value < 0.05) led to further testing of the selected metabolites with high VIP values as biomarker candidates for RA.VIP and the p-value conrmed the key metabolites for separating the groups from the other groups.Elemental compositions were calculated using Masslynx 4.1 soware.Evidence for the identity of features was accumulated by integrating information on accurate mass of the ions and some fragments, retention time, HMDB database, and in house human urine compound lists.Then, the identication of some compounds was veried by authentic standards.In this study, 5 metabolites (VIP > 15 and P < 0.001) were found to have VIP values higher than 15, of which 4 metabolites were higher in the RA group, whereas 1 metabolite was higher in the healthy group.Note that the fold change of succinate was highest in the RA group, and the fold changes of taurocholic acid, hippurate, and L-asparagine were much higher than those of other metabolites in the RA group.The fold changes of the metabolite abundances increased in the RA group and ranged from 2.76 to 32.15 (Table 1).

Network function analysis
Ingenuity Pathway Analysis (IPA) was applied with the related metabolites to explore pathway and network analysis of the differentially expressed metabolites in RA.The top six altered pathways were generated beyond the discriminating metabolites and are listed in Table S2.† In the network function analysis, the related metabolites in RA tended to gather into an integrated network (Fig. 2A).Bioinformatics analysis with the IPA soware found a strong correlation between bile acid biosynthesis, hepatic cholestasis, asparagine degradation I, asparagine biosynthesis I, FXR/RXR activation, tRNA charging, and these metabolites (Fig. 2B).

ROC analysis
The diagnostic potential of these metabolic biomarkers for RA was evaluated in the external validation data, as described in the methods section.The area under the ROC curve (AUC) provides a numerical value of the relationship between the specicity and sensitivity of a biomarker.Furthermore, the sensitivity and specicity indicate the probably tests for correctly identifying patients with RA and without RA (healthy groups), respectively.An AUC value of 1.0 indicates perfect prediction of the diagnostic test.Table S2 † also shows the ROC curve analysis for the predictive power of the 5 biomarkers (taurocholic acid, hippurate, L-asparagine, glycocholic acid, and chenodeoxycholic acid) of the RA group to discriminate RA and healthy groups.The values of AUC ranged from 0.819 to 0.993 and indicated the   potential capacity of these metabolites to distinguish RA patients from normal subjects (Table 1).Note that taurocholic acid had a sensitivity of 98.7% and a specicity of 95.7%, which were obtained from the ROC curve, and the value of AUC was 0.993, exhibited good diagnostic performance.To demonstrate the utility of urine biomarkers for the early diagnosis of RA, taurocholic acid and hippurate were selected to form a biomarker group, indicating a high predictive ability for RA patients.

Discussion
Metabolomic biomarkers are potentially related to the clinical manifestation of disease.However, to the best of our knowledge, in-depth study of the relationship between the urine metabolite prole and RA is still required.5][26][27][28] In the present study, we report a mass spectrometry-based metabolic phenotyping study to identify the global metabolic defects as well as distinct metabolic signatures of RA in comparison with healthy control subjects.We also aimed to further characterize metabolic signatures and potential biomarkers of RA.In this study, an untargeted metabolomics study based on an UPLC-QTOF/MS technique was performed to investigate dysregulated metabolic signatures in urine samples of RA patients.We rst compared the metabolic characteristics of RA patients with those of healthy controls using a LC-MS platform combined with multivariate statistical analysis.The ndings demonstrated good discrimination between RA and healthy controls.On the basis of the VIP threshold and Student's t-test, a total of 5 identied potential biomarkers were selected.Among these, taurocholic acid and hippurate were selected to form a biomarker group, indicating a high predictive ability for RA patients.The top canonical pathways were bile acid biosynthesis, hepatic cholestasis, asparagine degradation I, asparagine biosynthesis I, FXR/RXR activation, and tRNA charging.These results suggested that combining UPLC-MS and multivariate data analysis techniques can be used for a comprehensive urine metabolomics analysis and screening of biomarkers for the diagnosis of RA.Taurocholic acid is a bile acid and is the product of conjugation of cholic acid with taurine.Bile acids are physiological detergents that facilitate excretion, absorption, and transport of fats and sterols in the intestine and liver. 29Bile acids are also steroidal amphipathic molecules derived from the catabolism of cholesterol.They modulate bile ow and lipid secretion, essential for the absorption of dietary fats and vitamins, and have been implicated in the regulation of all the key enzymes involved in cholesterol homeostasis. 30Hippurate is an acyl glycine formed by the conjugation of benzoic acid with glycine.Acyl glycines are produced by the action of glycine N-acyltransferase, which is an enzyme that catalyzes the chemical reaction. 31Moreover, L-asparagine is not an essential amino acid, which means that it can be synthesized from central metabolic pathway intermediates in humans and is not required in the diet.The precursor to asparagine is oxaloacetate, which is converted into aspartate using a transaminase enzyme. 32The enzyme transfers the amino group from glutamate to oxaloacetate producing alpha-ketoglutarate and aspartate.The enzyme asparagine synthetase produces asparagine, AMP, glutamate, and pyrophosphate from aspartate, glutamine, and ATP.In the asparagine synthetase reaction, ATP is used to activate aspartate, forming beta-aspartyl-AMP. 33Glycocholic acid is an acyl glycine and a bile acid-glycine conjugate.It is a secondary bile acid produced by the action of enzymes existing in the microbial ora of the colonic environment. 34In hepatocytes, both primary and secondary bile acids undergo amino acid conjugation at the C-24 carboxylic acid on the side chain, and therefore, almost all bile acids in the bile duct exist in a glycine conjugated form.Chenodeoxycholic acid is a bile acid.Bile acids are steroid acids found predominantly in the bile of mammals.Bile acids are physiological detergents that facilitate excretion, absorption, and transport of fats and sterols in the intestine and liver.They have been implicated in the regulation of all the key enzymes involved in cholesterol homeostasis. 306][37][38][39] Our ndings have dened a panel of molecules whose levels are altered in the urine of RA patients.The urine biomarkers identied in this study exhibited satisfactory diagnostic performance.The strength of this study is that urine analysis with metabolomics is simple and noninvasive.In conclusion, this is the rst report on the identication of potential biomarkers for RA using urine of human RA patients.We also demonstrated that high-throughput metabolomics coupled with multivariate data analysis and ingenuity pathway analysis may be a useful tool to discover biomarkers.

Conclusion
In this study, we have reported the potential of metabolomics to more effectively and efficiently diagnose RA and discriminate between RA patients and healthy subjects.Urine metabolite proling showed RA to be associated with a profound abnormality in metabolic phenotype.The levels of taurocholic acid, hippurate, L-asparagine, glycocholic acid, chenodeoxycholic acid in the RA cases were signicantly different from those in the control subjects.The ve most signicantly different pathways were tRNA charging, bile acid biosynthesis, hepatic cholestasis, asparagine degradation I, asparagine biosynthesis I, and FXR/RXR activation.The biomarkers identied in this study exhibited satisfactory diagnostic performance.The discovery of candidate biomarkers in RA patients is an essential research area for the improvement of disease monitoring and the development of personalized medicine.The ndings suggest high-throughput metabolomics coupled with multivariate data analysis, and the ingenuity pathway analysis provided a feasible way to discover regulated metabolite signatures of RA that possesses great potential for the diagnosis of RA patients.

Fig. 1
Fig. 1 Metabolic profiles analysis.(A) The typical base peak chromatograms of control subjects (up) and RA patients (down) by UPLC-QTOF/MS, (B) score plot of the PCA model for control subjects (green), RA group (red), and QC samples (blue), (C) score plot of the OPLS-DA model of the UPLC-QTOF/MS data from RA patients (red) and controls (green) group, (D) VIP-plot for selection of interesting variables for patients with RA and age-matched healthy controls.

Fig. 2
Fig. 2 Ingenuity pathway analysis.(A) The merged network of the identified metabolites.Metabolite symbols in red were up-regulated, whereas those in pink were down-regulated.Solid lines between molecules indicate a direct physical relationship between molecules, whereas dotted lines indicate indirect functional relationships.(B) Network function analysis of the differential metabolites by ingenuity pathway analysis.

Table 1
Information on biomarker metabolites analyzed by UPLC-QTOF/MS