Yanyue Yana,
Qihui Wanga,
Wenwen Lia,
Zhongjun Zhaob,
Xin Yuanb,
Yanping Huangb and
Yixiang Duan*c
aResearch Center of Analytical Instrumentation, Analytical & Testing Center, Sichuan University, Chengdu, P. R. China
bResearch Center of Analytical Instrumentation, College of Chemistry, Sichuan University, Chengdu, P. R. China
cResearch Center of Analytical Instrumentation, College of Life Sciences, Sichuan University, Chengdu, P. R. China. E-mail: yduan@scu.edu.cn; Fax: +86 028 85418180; Tel: +86 028 85418180
First published on 29th May 2014
The aim of the study was to apply gas chromatography-mass spectrometry (GC-MS) combined with a metabolomics approach to identify distinct metabolic signatures of type 2 diabetes mellitus (T2DM) and healthy controls from exhaled breath, which are characterized by a number of differentially expressed breath metabolites. In this study, breath samples of patients with type 2 diabetes mellitus (T2DM, n = 48) and healthy subjects (n = 39) were analyzed by GC-MS. Multivariate data analysis including principal component analysis (PCA) and orthogonal partial least squares discriminant analysis (OPLS-DA) was successfully applied to discriminate the T2DM and healthy controls. Eight specific metabolites were identified and may be used as potential biomarkers for diagnosis of T2DM. Isopropanol and 2,3,4-trimethylhexane, 2,6,8-trimethyldecane, tridecane and undecane in combination might be the best biomarkers for the clinical diagnosis of T2DM with a sensitivity of 97.9% and a specificity of 100%. The study indicated that this breath metabolite profiling approach may be a promising non-invasive diagnostic tool for T2DM.
Type 2 diabetes mellitus (T2DM) is a metabolic disorder that is characterized by high blood glucose in the context of insulin resistance and relative insulin deficiency.7 Rates of T2DM have increased markedly since 1960 in parallel with obesity: as of 2010 there are approximately 285 million people with the disease compared to around 30 million in 1985.8,9 Traditional diagnosis and management of T2DM hinges on blood tests (for plasma glucose and glycated hemoglobin), which may be expensive, unpractical, and even painful. Frequent blood testing is especially necessary for patients undergoing insulin treatment. Therefore large resources have been invested worldwide in developing non-invasive devices for diabetes diagnosis and management. Breath analysis, as is one of the most promising approaches for clinical routine, is increasingly being exploited for clinical diagnosis.10 Breath testing is non-invasive offering an attractive, inexpensive, and patient-friendly evaluation. Furthermore, sample collection is easy and can even be obtained from unconscious patients. Diabetes and its related dysmetabolic states could clearly greatly benefit from the introduction of similar non-invasive tests for diagnostic, preventive and monitoring purposes too. This study investigated breath analysis as a diagnostic tool for T2DM and identified the biomarkers in the breath volatile organic compounds (VOCs).
Due to its higher consistency, robustness, and sensitivity, gas chromatography coupled with mass spectrometry (GC-MS) is one of the most frequently used analytical techniques for profiling primary metabolites.11 Combined with public databases, the application of GC-MS for compound identification makes it of great value for metabolomics.12 It has been widely applied in disease biomarker discovery. As a metabolite profiling technique, GC-MS has been used to detect and discriminate various diseases, such as diabetic kidney disease,13 colon cancer,14 type 2 diabetes mellitus,15 cirrhosis and hepatic encephalopathy.2 For the GC-MS experiments, we performed a solid phase micro extraction (SPME) coupled with GC-MS in metabolic profiling for analysis of T2DM. SPME is a simple, rapid and solvent-free sample preparation technique that can be directly coupled to GC-MS.16 It has several advantages including, faster extraction and desorption, direct compatibility with the GC inlet, and less required sample. The breath compounds were identified as many as possible by the use of SPME.
Metabolomics is the scientific study of chemical processes involving metabolites. The metabolome represents the collection of all metabolites in a biological cell, tissue, organ or organism, which are the end products of cellular processes.17 The idea that biological fluids and tissues have important relationships with the health of an individual has existed for a long time.18 Even in ancient times, physicians recognized that certain breath odors were associated with specific pathological states, such as a ‘fishy’ smell in response to renal failure and a ‘fruity’ smell will associate with diabetes.19 Ancient Chinese doctors used ants to detect whether the urine of patients contained high levels of glucose, and hence detect diabetes.20 Recently, many scientists have studied metabolic diseases using metabolomics as a main technique. For example, the Wang-Sattler group have used statistical and bioinformatical methods for analyzing metabolite concentration profiles for the identification of candidate biomarkers of T2DM.21,22 It was demonstrated that GC-MS could be used to measure compounds present in human urine and tissue extracts by Horning et al. in 1971.23 Followed, the Horning group, along with that of Linus Pauling and Arthur B Robinson led the development of GC-MS methods to monitor the metabolites present in urine through the 1970s.24 The role of metabolomics and metabolic profiling coupled with GC-MS is a rapidly growing field in disease diagnosis,15,25 therapeutics,26,27 functional genomic28,29 and toxicology studies.30,31 In this study, the GC-MS data would then be coupled with chemometric methods to develop models to identify the biomarkers of T2DM. Data obtained from metabolomics studies are complex and diverse.32 Therefore, the common statistical approach used in metabolomics data analysis is based on multivariate statistical analysis (MVA) for biomarker discovery including principal component analysis (PCA),33,34 partial least square discriminant analysis (PLS-DA)35,36 and orthogonal partial least squares discriminant analysis (OPLS-DA).37 The method is the simplest of the true eigenvector-based multivariate analyses, which is mostly used as a tool in exploratory data analysis and for making predictive models. PCA can supply the user with an overview of the clustering trend in the data by compressing the multidimensional data into a few principal components. PLS-DA and OPLS-DA calculate principal components in cooperation with the classification information and are more powerful to deal with complex multidimensional data.32 Using this protocol, large amounts of information can be acquired, with high reproducibility, on the breath metabolome.
This article discusses the breath metabolic changes of T2DM by coupling GC-MS with multivariate data analysis that included PCA, OPLS-DA. By clearly revealing the biomarkers (retention time), OPLS-DA can be successfully applied to separate the T2DM and healthy controls groups. In addition, nonparametric testing and receiver operator characteristic curves (ROC) analysis were performed to validate the robustness of OPLS-DA model. With these analyses, we identified and selected the potential biomarkers and discussed their biological functions.
In this study, 48 T2DM patients and 39 healthy controls were recruited for this study. All the breath samples of patients (22 males and 26 females, aged 14–85 years) were from the Second Affiliated Hospital of Jilin University, Jilin, China. The T2DM patients were diagnosed according to the criteria of WHO (World Health Organization) 1999. The inclusion criteria defined that the patients of 14–90 years old had no histories of receiving medication and none of them had been suffering from the other known chronic disease, while the patients receiving long-term medication or suffering from the other known chronic disease, or lung ventilation dysfunction were all excluded based on the exclusion criteria. Each participant wrote informed consent prior to the study. All study procedures were approved by the Ethic Committee of Chinese People's Liberation Army 208 Hospital, China. The 39 health controls (15 males and 24 females, aged 21–71 years) were from Sichuan University, Chengdu, China, without receiving long-term medication and suffering from known chronic disease. All samples were collected at a stable state without recent dietary intake and exercise.
(1) |
A matrix table consisting of the peak number (based on the retention time and m/z), sample name, and the normalized peak intensity, was produced in the batch job by a computer program which was coded in MATLAB R2013a (Mathworks, Natick, MA, USA). The program was performed as following two steps:
(1) an average retention time list (ARTL) which is associated with the metabolites in each chromatogram was established. Firstly, we choose a chromatographic peak list as a reference list (ARTL1). Then, the average retention time list was produced by comparison of the other peak lists (PL) as is referenced in eqn (2).
(2) |
(2) Chromatographic peaks in the raw data files are detected according to the average retention time and tracking the apex of the peaks in the chromatograms. The track peak parameters were as following retention time window of 0.05 min.
Therefore, a matrix of data with one column per sample (include all the samples of the metabolomics experiment) and one row per mass signal was generated. This process also leads to peak missing, which may be the severer drift of peak or the quantity is too low to be detected. If so, we should check the chromatogram to ensure the correct result.
Followed, the resulting multivariate dataset was imported into SIMCA-P 11.0 software (Umetrics, Umeå, Sweden) as variables for the principal components analysis (PCA), and the orthogonal projection to latent structures with discriminant analysis (OPLS-DA). In addition, nonparametric Mann–Whitney U testing was used to assess whether the potential biomarkers is significantly different between the T2DM and healthy control group. The results were considered significant if the estimated p < 0.05. Receiver operator characteristic curves (ROC) analysis was performed to validate the robustness of OPLS-DA model, and the areas under curve (AUC) value, specificity, sensitivity were calculated to evaluate the diagnostic value of the potential biomarkers from the differential metabolites of the disease. All additional statistical analysis was conducted using IBM SPSS Statistics 19.0 (SPSS Inc., Chicago, Illinois, USA).
Metabolites were identified by matching retention time as well as mass spectra compared with the National Institute of Standards and Technology (NIST) mass spectra library (NIST 08, NIST Mass Spectral Search Program, Version 2.0f, USA). Additionally, on-line mass spectra searches in the Human Metabolome Database (http://www.hmdb.ca), KEGG (http://www.genome.jp/kegg/ligand.html) and the MASS Bank (http://www.massbank.jp) were performed. Commercial standard reagents were used to support identification of metabolites.
Fig. 1 Typical GC-MS spectra obtained for breath from T2DM and healthy controls, highlighting the peaks corresponding to the 8-compounds model. The labels in part correspond to the labels in Table 1. |
Following PCA the data sets were processed using more powerful supervised statistical method, OPLS-DA. Firstly, we applied the permutation test with 200 iterations to assess whether the PLS-DA model is valid and well-fit. In our study, the R2 intercepts were 0.359 and the Q2 was below 0 using the supervised projection method PLS-DA (Fig. 2b). In OPLS-DA, the model resulted in one predictive and four orthogonal (1 + 4) components (R2X = 0.788, R2Y = 0.908, Q2 (cum) = 0.823), revealing the high discriminative and predictive ability, shown in Fig. 2c.
Fig. 2d showed the S-plot of OPLS-DA. The potential biomarkers from S-plot in our study were selected according to the parameters of VIP. In the S-plot, fifteen metabolites were highlighted. For each selected metabolites biomarkers, a nonparametric Mann–Whitney U test was performed to assess the univariate importance of the metabolite and a value of p < 0.05 was considered significant. A total of 15 differentially expressed metabolites were measured. Only eight metabolites most strongly influencing the differentiation were listed in Table 1 with the VIP > 1 and p < 0.05. The identification of compounds considered as potential biomarkers in the model was made using NIST Mass Spectral Search Program, the HMDB (http://www.hmdb.ca), and the MASS Bank (http://www.massbank.jp). The detailed method for the verification and validation of the potential biomarkers has been mentioned in the following work.
No. | RT (min) | Metabolites | VIP | p-Value |
---|---|---|---|---|
1 | 8.497 | Acetone | 9.30 | 0.004 |
2 | 8.720 | Isopropanol | 7.79 | <0.001 |
3 | 15.875 | Toluene | 1.66 | <0.001 |
4 | 18.265 | m-Xylene | 1.89 | 0.005 |
5 | 19.699 | 2,3,4-Trimethylhexane | 2.02 | <0.001 |
6 | 21.009 | 2,6,8-Trimethyldecane | 3.64 | <0.001 |
7 | 21.540 | Tridecane | 6.53 | <0.001 |
8 | 21.995 | Undecane | 3.32 | <0.001 |
Among these metabolites, three potential biomarkers (acetone, isopropanol, and m-xylene) were confirmed using standard samples. A series stock solution of acetone, isopropanol, and m-xylene with the concentration of 10 μmol mL−1 was prepared with ethanol. We injected 10 μL stock solutions into a 3 L bag, respectively, before introducing pure nitrogen into the bag. And the samples were stored at room temperature to fully evaporate in the bags for more than two hours. The fragmentations of all samples obtained were shown in Fig. 3. Three standard samples were matched with the results as mentioned previously by the comparison of retention time and the electron ionization mass spectra.
Fig. 3 Ion current chromatograms of acetone (a), isopropanol (b), and m-xylene (c) compared with ethanol and pure nitrogen in pairs performing the electron ionization mass spectra. |
We also performed the ROC analysis to characterize these potential biomarkers of T2DM. All potential biomarkers could be divided into two groups, in which seven potential biomarkers were up-regulated in T2DM patients, and one potential biomarker was down-regulated. Fig. 4 displayed the ROC curve analysis of the eight biomarkers including acetone, isopropanol, toluene, m-xylene, 2,3,4-trimethylhexane, 2,6,8-trimethyldecane, tridecane, and undecane. Acetone, isopropanol, tridecane, 2,6,8-trimethyldecane, undecane, 2,3,4-trimethylhexane and toluene were shown higher levels in T2DM (Fig. 4a), while m-xylene were shown lower levels in T2DM (Fig. 4b). The detailed statistics of the value of area under the ROC curves (AUC), and the corresponding sensitivities and specificities for each of the potential biomarkers of T2DM were listed in Table 2. Such as the isopropanol had a sensitivity of 79.2% and a specificity of 92.3%, the calculated area under the ROC curve was 0.876 (95% confidence intervals, 0.795–0.956). To demonstrate the utility of breath metabolites for the discrimination between T2DM and healthy controls, a logistic regression model was built based on five validated biomarkers with AUC > 0.8 (isopropanol, 2,3,4-trimethylhexane, 2,6,8-trimethyldecane, tridecane, and undecane). The ROC curve was computed for the logistic regression (LR) model. As a result, we obtained a sensitivity of 97.9% and a specificity of 100% of LR model. The calculated area under the ROC curve was 1.00 (95% confidence intervals, 1.000–1.000), as shown in Fig. 5a. We also displayed the box plots of these five potential biomarkers in distinguishing T2DM from healthy controls (Fig. 5b–f).
Fig. 4 Receiver operating characteristic (ROC) curve analysis for the predictive power of up-regulated biomarkers (a) and down-regulated biomarkers (b) for distinguishing T2DM from healthy controls. |
Metabolites | AUC (95% CIs) | Sensitivity (%) | Specificity (%) |
---|---|---|---|
Acetone | 0.679 (0.567, 0.791) | 45.8 | 87.2 |
Isopropanol | 0.876 (0.795, 0.956) | 79.2 | 92.3 |
Toluene | 0.737 (0.627, 0.847) | 66.7 | 82.1 |
m-Xylene | 0.677 (0.556, 0.798) | 69.2 | 72.9 |
2,3,4-Trimethylhexane | 0.910 (0.835, 0.985) | 89.6 | 94.9 |
2,6,8-Trimethyldecane | 0.949 (0.903, 0.995) | 89.6 | 94.9 |
Tridecane | 0.870 (0.779, 0.962) | 89.6 | 84.6 |
Undecane | 0.911 (0.847, 0.976) | 89.6 | 82.1 |
PCA analysis did not identify any particular similarity or large differences between sample profiles. Therefore, we used more powerful supervised statistical method, OPLS-DA. OPLS-DA is the extension orthogonal projections of the PLS-DA, while PLS-DA is a classification method based on the regression extension of PCA. In PLS-DA, generally, the original model is considered well-fit when the R2 intercepts are <0.4 and the Q2 is <0.05 in the permutation test with 200 iterations.39,40 In our study, the results (R2 = 0.359, Q2 < 0) indicated that the models were statistically valid and well-fit. The OPLS-DA method was used to test the differences in metabolite between T2DM and healthy controls, and to identify the potential biomarkers of T2DM. As shown in Fig. 2c, the results demonstrated that the OPLS-DA models were well-fit and highly discriminative and predictive, which was conclusive for supporting the presented separation between the metabolite profiles of the T2DM group and healthy controls. It was observed that healthy controls were well separated from the T2DM.
The S-plot visualizes the covariance and correlation among variables, thus it is used to identify discriminating variables.37 And the variable importance in the projection (VIP) value of OPLS-DA models is a major parameter for the detection of potential biomarkers. As shown in Fig. 2d, the 15 potential biomarkers which were highlighted from S-plot were selected with VIP > 1. Nonparametric Mann–Whitney U test was performed to assess the univariate importance of the metabolite and a value of p < 0.05 was considered significant. The results demonstrated that only eight metabolites were strongly differential with the VIP > 1 and p < 0.05 (Table 1).
All metabolites could be divided into two groups, up-regulated metabolites and down-regulated metabolites. The results indicated that acetone, isopropanol, tridecane, 2,6,8-trimethyldecane, undecane, 2,3,4-trimethylhexane and toluene were shown higher levels in T2DM, while m-xylene were shown lower levels in T2DM. Furthermore, it has been reported that ROC analysis is able to determine easily ability for identifying disease at any cutoff. AUC value from ROC is usually between 0.5 and 1.0. The AUC value is more close to 1, the higher the accuracy test is, and the bigger the diagnostic value is. Furthermore, the logistic regression model was built to demonstrate the utility of breath metabolites for the discrimination between T2DM and healthy controls. Five validated biomarkers with AUC > 0.8 were used to build the LR model. As a result, the LR model with a sensitivity of 97.9% and a specificity of 100% demonstrated that isopropanol, 2,3,4-trimethylhexane, 2,6,8-trimethyldecane, tridecane, and undecane in combination provided better prediction in T2DM.
Isopropanol belongs to the family of alcohols and polyols compounds. The previous report indicated that isopropanol is one of the product from propanoate metabolism, and the substrate for synthesizing acetone catalyzed by the enzyme isopropanol dehydrogenase.46 It can be detected and quantified from blood, urine and cerebrospinal fluid (CSF). Isopropanol was shown higher level in T2DM patients relative to the healthy controls in our work, and we found that it followed the increased level of acetone. Moreover, as reported, Through the detection of isopropanol in a patient with diabetic ketoacidosis, it has been proved that isopropanol may be a byproduct of acetone metabolism in certain disease states.47 Thus, we suggested that the isopropanol is associated with acetone metabolism, which is believed to be a significant differential metabolite in T2DM.
2,3,4-Trimethylhexane, 2,6,8-trimethyldecane, tridecane, and undecane are the acyclic alkanes. The experimental data indicated that T2DM patients had higher levels of 2,3,4-trimethylhexane, 2,6,8-trimethyldecane, tridecane, and undecane than healthy controls. Tridecane and undecane belong to the family of fatty acyls, and might be the product of polyunsaturated fatty acids metabolism, which are considered as a group in terms of their roles in fatty acids metabolism. They are found in allspice. Tridecane is also one of the major chemicals secreted by some insects as a defense against predators, while undecane is used as a mild sex attractant for various types of moths and cockroaches, and an alert signal for a variety of ants. As reported, undecane had been detected in human urine as one of the metabolic products.48 2,3,4-Trimethylhexane and 2,6,8-trimethyldecane belong to the family of endogenous metabolite which is a metabolite that are synthesized by the enzymes encoded by the genome or the microfloral genomes in HMDB. However, their metabolic pathways and the reason with the higher level in T2DM are not yet known.
m-Xylene belonging to the family of toluenes. As reported in KEGG database, in metabolism, m-xylene is involved in redox reaction with nicotinamide adenine dinucleotide (NADH), carrying electrons from one reaction to another. This reaction forms NADH, which can then be used as a reducing agent to donate electrons. These electron transfer reactions play an important role of beta oxidation, glycolysis, and the citric acid cycle. The first step in glycolysis is phosphorylation of glucose by a family of enzymes called hexokinases to form glucose 6-phsphate (G6P). In animals, an isozyme of hexokinase called glucokinase is also used in the liver, which has a much lower affinity for glucose, and differs in regulatory properties.49 The different substrate affinity and alternate regulation of this enzyme are a reflection of the role of the liver in maintaining blood sugar levels. Glucokinase activity serves as a principal control for the secretion of insulin in response to rising levels of blood glucose.50–53 As G6P is consumed, increasing amounts of ATP initiate a series of processes that result in release of insulin. One of the immediate consequences of increased cellular respiration is a rise in the NADH concentrations. As is known, T2DM is a metabolic disorder that is characterized by high blood glucose without enough insulin, resulting in reduced glucokinase. G6P converted by the glucose decreased because of the low concentration of glucokinase. Therefore, the concentration of m-xylene in exhaled breath might be shown lower level in T2DM due to a decreased glycolysis, as demonstrated by the experimental data in our study.
Toluene is an aromatic hydrocarbon. It has been shown to exhibit beta-oxidant, depressant, hepatoprotective, anesthetic and neurotransmitter functions.54–57 Toluene can be synthesized from benzylalcohol with the oxidized ferredoxin, reported in KEGG database. Adrenal ferredoxin (adrenodoxin) is expressed in mammals including humans. The human variant of adrenodoxin is referred to as ferredoxin-1.58 Ferredoxin-1 in humans participates in the synthesis of thyroid hormones. It also transfers electrons from adrenodoxin reductase to the cholesterol side chain cleavage cytochrome P450.59,60 The reason that increased level of toluene in T2DM remains to be unknown. Nevertheless, we suggested that there is a significant involvement in adrenodoxin metabolism. Efforts should be done for the relationships of toluene and T2DM. Toluene had also been detected in the urine of breast cancer patients and normal controls.61 It was reported that toluene may influence the glutamate and taurine neurotransmitter levels to control the actions.62
T2DM is a complex disease, caused by a combination of lifestyle and genetic factors, results a number of complications, including ischemic heart disease, stroke, even non-traumatic blindness and kidney failure. Therefore, T2DM is associated with numerous metabolites, which will improve the sensitivity and specificity for T2DM detection. In our study, several potential biomarkers showed their strong predictive power for distinguishing T2DM from healthy controls. Nevertheless, the actual metabolic pathways of them are not yet known. Therefore, efforts should be made to find out their metabolic pathways and interaction with protein, enzymes, or other small molecules. It will be very helpful for pathogenesis research of T2DM. The discovered candidate biomarkers also need to be extensively validated before they can be translated into real world diagnostic and screen application.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/c4ra01422g |
This journal is © The Royal Society of Chemistry 2014 |