Comparison of GC-MS, HPLC-MS and SIFT-MS in conjunction with multivariate classification for the diagnosis of Crohn's disease in urine

The developed world has seen an alarming increase in the incidence of gastrointestinal diseases, among the most common of which is Crohn's disease (CD) in the young. The current “ gold standard ” techniques for diagnosis are often costly, time consuming, ine ﬃ cient, invasive, and o ﬀ er poor sensitivities and speci ﬁ cities. This paper compares the performances of three hyphenated instrumental techniques that have been suggested as rapid methods for the non-invasive diagnosis of CD from urine. These techniques are gas chromatography-mass spectrometry (GC-MS), high performance liquid chromatography-mass spectrometry (HPLC-MS) and selected ion ﬂ ow tube mass spectrometry (SIFT-MS). Each of these techniques is followed by multivariate classi ﬁ cation to provide a diagnosis based on the acquired data. The most promising results for potentially diagnosing CD was via HPLC-MS. An overall classi ﬁ cation accuracy of 73% (74% speci ﬁ city; 73% sensitivity) was achieved for di ﬀ erentiating CD from healthy controls, statistically signi ﬁ cant at 95% con ﬁ dence.


Introduction
The incidences of patients in the developed world being diagnosed with gastrointestinal diseases have been increasing in recent years. This may be attributable to a combination of lifestyle traits, particularly unhealthy diet involving foods high in saturated fat, starch and sugar, and lack of fruit and vegetables normally rich in anti-oxidants. 1 This has led to an increased interest in research into the causes, prevention and possible cure of these diseases. 2 Further to this are cases of food intolerance which could be due to abnormal fermentation processes occurring within the gut 3 or a drastic change in diet leading to increased incidences of Crohn's disease. 4 Crohn's disease (CD) is a debilitating inammatory bowel disease (IBD) causing inammation of the mucosal lining in the gut. [5][6][7] It is known that CD can affect any part of the gastrointestinal tract, whereas ulcerative colitis (UC)another IBDtypically only affects the large colon. The presenting symptoms of CD and UC (chiey abdominal pain and diarrhoea) are similar making differential diagnosis of these two conditions challenging. This is very important because the treatment required is different. 8 There have been occasions in which a patient deemed to be suffering from UC is later diagnosed with CD. This is in conjunction with the discovery of new therapeutic agents employed to treat IBD. 9,10 Colonoscopy and sigmoidoscopy are the current "gold standard" methods of diagnosing CD (and UC). 11,12 Sigmoidoscopy permits a direct 5-20 minute examination of the lining of the rectum and the lower part of the colon by using a bre-optic scope attached to a camera enabling the examiner to observe the lining for any irregularities. Colonoscopy makes use of a probe of greater length which is able to extend up to the ileum. 13 The duration of the procedure can be 30 minutes in which it is necessary for the patient to be sedated. This tends to be a more accurate diagnostic technique than sigmoidoscopy. 8,11 The two techniques are however highly invasive and expensive to perform.
An alternative approach is the determination of chemical biomarkers such as faecal calprotectin. [14][15][16][17] However, the diagnostic performance of these tests is limited, which has led to the investigation of analytical approaches to the non-invasive diagnosis and monitoring of these conditions. 18 More recently, analytical techniques incorporating mass spectrometric methods have been used to capture metabolic proles of clinical samples. These can either analyse metabolites in solution 19 or those in the vapour phase, so called volatile organic compounds (VOCs). These have advantages in that analysis can be non-invasive if urine, faeces, breath and some other uids are analysed, reducing the need for invasive procedures which are uncomfortable and costly.
High performance liquid chromatography coupled with mass spectrometry (HPLC-MS) is routinely employed in proteomics, 20 and gas chromatography mass spectrometry (GC-MS) techniques have been employed for many years for the detection of metabolites 21 including the possible diagnosis of gastrointestinal diseases. 22,23 HPLC-MS is also being employed in the study of metabolomics data such as the determination of the changes in the human urinary metabolome aer the consumption of certain nuts, e.g. almonds, 24,25 and the study of the age and strain-related differences in the Zucker rat. 26,27 It has also been used in conjunction with proton nuclear magnetic resonance ( 1 H NMR) for the analysis of biouid samples. [28][29][30] The relatively new technique of selected ion ow tube mass spectrometry (SIFT-MS) is also being employed in metabolomics. [31][32][33][34][35][36] Rapid and quantitative analysis of VOCs can be achieved using SIFT-MS. This employs a fast ow tube to study the reaction of precursor ions with sample molecules in gas or vapour form. The ow tube technology, along with the quantitative mass spectrometry, allows selected precursor ions (H 3 O + , NO + , and O 2 + ) to react in turn with the sample molecules to produce product ions through chemical ionisation (CI). These product ions are separated in a downstream quadrupole and are then detected and quantied. A kinetic database is used to quantify the concentrations of various molecules present in the sample. The particular precursor ions are chosen because they have slow reaction rates with the components of air, but react quickly with trace gases and vapours that may be used in research. This technology, unlike most CI techniques, is able to use all three reagents rapidly in turn on the same instrument. 32 SIFT-MS is being widely employed for the real time analysis of volatile compounds originating from biological systems in medical applications 33 and clinical diagnosis. 34 A key advantage of SIFT-MS is the ability to distinguish between different isomers via the use of the three precursor ions mentioned previously. 37 These analytical techniques all have the capability of producing large amounts of data about metabolites present, and therefore sophisticated techniques are needed in analysis. Multivariate classication is a pattern recognition technique which determines which samples belong to a designated class. 21 One approach is partial least squares discriminant analysis (PLS-DA) 38,39 and is termed a supervised method leading to the separation of samples into different classes, for example healthy and diseased. Although there exist more advanced techniques such as support vector machines 40 and articial neural networks, 41 PLS-DA permits the direct identication of statistically signicant features that may be related to potential biomarkers by visual inspection of the PLS loadings. 39,42 A recent study reported the use of GC-MS in conjunction with PLS-DA for the diagnosis of gastrointestinal diseases including Crohn's disease in a series of matrices (faeces, breath, blood and urine). 43 This study found that only CD could be diagnosed in the presence of the other diseases and healthy controls. This was achieved using faecal material. Very good accuracy was also attained for CD by analysing urine, but the sensitivity was below 50% and thus of no diagnostic benet.
In looking at diagnostic or screening tests, the ease of sample acquisition and use is an important consideration. Faecal samples are generally harder to collect and process, not least of which is due to the subjects' reluctance to provide samples. Blood is also occasionally problematic due to the invasive nature of the sampling, and the discomfort caused to patients by venepuncture. For this reason, and the ease of collection and storage, urine is considered to be a better matrix to use, which is the reason why it was investigated in this study.
The present article describes the application of multivariate classication to GC-MS, HPLC-MS and SIFT-MS data acquired from the same urine samples which were employed in our previous paper. 43 This was carried out in order to determine whether the greater sensitivity and overall accuracy was achieved compared with the results obtained using GC-MS data, the overall objective being to distinguish patients suffering from CD from those with IBS and UC and from healthy individuals.

Reagents
Unless otherwise stated, analytical grade reagents and solvents were employed.

Sampling
Selection of candidates. Volunteers were recruited from patients attending the Gastroenterology Department at Addenbrooke's Hospital (Cambridge, UK). There were an initial total of 57 candidates of which 18 were diagnosed with Crohn's disease (CD), 8 with ulcerative colitis (UC), 18 with irritable bowel syndrome (IBS) and 13 who were deemed healthy. Each candidate gave oral and written consent. They were issued with a urine sample container bearing their individual code. This was to ensure patient condentiality in accordance with the Data Protection Act (1998).
The study had been ethically approved by the National Research Ethics Service in Leeds in July 2007 (07/Q1205/39).

Instrumental measurements
GC-MS. The samples were aspired into thermal desorption tubes and 50 ng of deuterated (D8)-toluene were added subsequently as an internal standard (IS). These tubes were loaded on to an automated thermal desorption gas chromatography mass spectrometer (ATD-GC-MS) and analysed. A more detailed account of the instrumental parameters and conditions are provided elsewhere. 43 HPLC-MS. Samples were prepared using a simple dilution approach (1 in 10) in water containing deuterated (D6)-caffeine (100 ng ml À1 ) as IS. This was executed by placing 900 ml of water/ IS in a standard LC-MS injection vial using a repeating pipette and adding 100 ml of sample (aspirated from the top of the sample tube without mixing to avoid picking up solids). The LC vials were then capped and vortex-mixed. All sample preparation and analysis was carried out in a single process. Samples were analysed using a 7 minute LC-MS run in positive ion mode on an Orbitrap Discovery system running at 30 000 resolution. The injection volume was 10 ml. A more detailed account of the instrumental parameters and conditions are listed in Table SM1 of the ESI. † SIFT-MS. Urine (2.5 ml) was placed in a sample bag made from 65 mm diameter Nalophan NA tubing (Kalle UK). The bags were sealed and then lled with hydrocarbon-free air and equilibrated in an incubator at 40 C to generate a vapour headspace. One end of each sample bag was connected via a Swagelok tting directly to the SIFT-MS capillary inlet for analysis of the urine headspace, and VOCs in the sample were analysed in each sample aer equilibration. The SIFT-MS was a Prole 3 model manufactured by Instrument Science (UK).
The sample VOCs react with one of three precursor ions (H 3 O + , NO + or O 2 + ) to generate product ions, which are then separated via a quadrupole and counted (in counts per second) at the detector. Thus the data obtained are in the form of counts per second determined over a 30 second period at each mass to charge ratio (m/z), from m/z 10 to m/z 140. The data thus obtained represent the amount of product ion formed using each of the three precursor ions. Using this instrument, whole volatile proles of samples may be generated very rapidly, offering real time instantaneous results as opposed to GC-MS and HPLC-MS which can only offer "snapshots" of instances in a particular time and space.
Data pre-processing GC-MS. The raw GC-MS les produced by the ATD-GC-MS instrument were imported into MATLAB (R2011a, MathWorks Inc., USA) in the NetCDF (.CDF) format. During the importation of each le, which corresponded to one sample, the intensity values pertaining to the retention times and mass-to-charge (m/z) ratios were normalised against the deuterated (D8)-toluene, and then summed across the m/z values to produce a data matrix whose order was the number of samples and the number of retention time values. This effectively forms a matrix of total ion count (TIC) chromatograms. Exploratory data analysis techniques were employed to identify any specic trends and sample outliers. These were principal components analysis (PCA) 44 in conjunction with Hotelling's T 2 statistic 45 and hierarchical cluster analysis (HCA). 46 This is necessary because the presence of outlying samples will affect the performance of the chromatographic peak alignment detailed below. No samples were identied to be outlying.
Correlation optimised warping (COW) 47 was employed to align the chromatograms. This has the advantage of requiring minimum user input especially as the two main parameters (segment and slack) are determined automatically. A segment contains a xed number of retention time ranges which contain peaks to be shied. The extremity of the shiing is determined by the slack. A reference chromatogram must rst be determined via a number of options such as the mean, median, maximum or the correlation coefficients. It is also possible to employ a PCA loading (typically PC1) as a reference chromatogram. 48 The latter was employed here. Aer the segment and slack parameters were automatically determined, they were employed to align the respective chromatograms within the data matrix.
HPLC-MS. The raw HPLC-MS data les were converted to ASCII text format (.MLT) by the HPLC-MS soware (MassTransit by Palisade). The contents of the text le were a column containing the total ion counts (TICs), a column containing the retention times, and a data matrix containing the single ion counts (SICs) ranging from m/z 80 to m/z 850. The text le was imported into MATLAB in its entirety. The single ion count (SIC) chromatogram for deuterated caffeine was extracted at the mass-charge (m/z) value of 201 in conjunction with the naturally occurring internal standard of creatinine 49 at the m/z value of 114, and all intensity values in each sample matrix were normalised against them. The normalised SICs in each sample were summed to form a total ion count (TIC) chromatogram for each sample. All samples were combined into a data matrix in which the dimensions were the number of samples by the number of retention time values.
Principal components analysis (PCA) and correlation optimised warping (COW) was employed as for GC-MS.
SIFT-MS. The raw SIFT-MS data les (three for each sample resulting from the three precursor ions employed) were processed by "SIFT-MS Soware" (v4.300.231.1396, © Patrik Spanel, 1996-2006), combined and exported to an Excel le, which was imported into MATLAB. Data pre-processing involved normalisation of the m/z values against the relevant precursor ions (e.g. normalised against m/z 19, the H 3 O + precursor), followed by removal of the precursor and associated ions prior to subsequent analysis; these were: 19 recorded in all samples were removed. PCA was performed as described for GC-MS leading to the identication of one outlying samplea UC sample, which was removed due to having relatively very high abundances compared to all other samples.

Data analysis
Multivariate classication. Multivariate classication was carried out in MATLAB also using functions from the PLS Toolbox (v3.5, Eigenvector Research Inc., USA). Partial least squares discriminant analysis (PLS-DA) 38 was employed to construct models relating the acquired data for each sample to the sample class. It employs the SIMPLS algorithm 50 to reduce the response proles (chromatograms for GC-MS and HPLC-MS, or mass spectra for SIFT-MS) into latent variables which capture the maximum amount of covariance. This is achieved using a two-step nested process which simultaneously optimises and evaluates the respective models via a heuristic bootstrapping approach in which leave-one-out cross-validation (LOO-CV) is employed for model optimisation. Bootstrapping is used to get meaningful performance metrics. This is described below.
Validation and performance metrics. Validation is usually performed to ensure that the classication models generated are robust in terms of their ability to generalise to newly acquired data. 51 Our approach has three nested stages which are an evaluation stage, an inner optimisation stage and an outer optimisation stage. The evaluation stage tests the performance of the optimal model suggested by the two optimisation stages. In this work 150 evaluations were performed. The average overall accuracy, specicity, and sensitivity over all evaluations are calculated. The area under the receiver operating characteristic (AUROC) curve is also determined from the sensitivity and specicity values via the trapezoid rule. 52 ROC curves are employed in the medical eld to determine whether a diagnostic test is sufficient in deducing whether an individual is healthy or has a particular disease/condition. 53 The overall success of classication is determined in conjunction with the specicity and sensitivity. Specicity determines how well the healthy (control) samples were classi-ed whilst sensitivity determines how well the target case (diseased) samples were classied.
Permutation testing. A permutation test was performed in order to determine whether the performance of the classication models were statistically signicant or due to chance. 51,54 This was achieved by the random assignation of the target cases  (e.g. healthy controls and CD) to each sample while maintaining the number of healthy and diseased samples actually observed, followed by classication using the heuristic bootstrapping approach described previously. This was repeated 300 times which ensured that a smooth distribution was attained. The distribution of the permutations (n 1 ¼ 300) was compared visually against the distribution of the evaluations (n 2 ¼ 150), and the statistical two-sampled two-tailed z-test was calculated (a ¼ 0.05) at the 95% condence limit in order to determine the statistical signicance. 55 Results and discussion Table 1 summarises the results attained for the classication of Crohn's disease against the healthy control in the urine sample matrix for GC-MS, HPLC-MS and SIFT-MS. Fig. 1 displays the permutation test 51 results comparing both the distribution of the permutations and the evaluations. Table 1 and Fig. 1 show that, using urine as the analyte matrix, HPLC-MS in conjunction with multivariate classication is the best of the three methods for differentiating CD patients from healthy controls. This is supported by the comparison of the means of the two distributions in which the calculated z-values were statistically signicant for HPLC-MS but insignicant for both GC-MS and SIFT-MS.
In clinical practice, the presenting symptoms are similar for a range of conditions. The data analysis was therefore repeated in order to differentiate CD in the presence of other diseases (IBS and UC) in addition to healthy controls. The results are summarized in Table 2. Fig. 2 shows the permutation test results for CD versus the healthy controls and other disease states comparing both the distribution of the permutations and the evaluations.
These results show that the discrimination of CD against healthy controls and other target cases is more challenging than separating CD from healthy controls only. This is evident in the sensitivities reported in Table 2 for HPLC-MS and SIFT-MS. By comparison, increased sensitivity was observed in the GC-MS data with overall classication accuracy (% CC) over 70% being attained. The specicities shown in Table 2 exceed 70%. This may be attributed to the imbalance of the respective datasets, resulting from combining the data from the other diseases (IBS and UC) with those of the healthy controls to form one "healthy" dataset. As a consequence the PLS-DA model is better trained to recognize the "healthy" class. However the sensitivity for GC-MS reported earlier in Table 1 was very low ($35%), suggesting failure to distinguish the target case (CD) from the healthy controls. To investigate this phenomenon, the misclassication of IBS, UC and healthy controls as CD was investigated (Table 3). Table 3 suggests that for GC-MS, 30% of control samples were misclassied as CD samples, whilst 26% and 25% of IBS and UC samples respectively were misclassied as CD. This explains why the sensitivity of $35% was attained for CD versus control only (Table 1), which was due to the difficulty in GC-MS distinguishing between controls and CD. This is also apparent for SIFT-MS, because 25% of control samples were misclassied as CD, and only 30% of CD samples were correctly classied. In contrast, fewer control samples were misclassied as CD via HPLC-MS. Furthermore, 26% of UC samples were misclassied as CD, conrming the difficulty in distinguishing between CD and UC. 9,10 Further interrogation of the loadings extracted from the PLS-DA model pertaining to the CD versus control dataset for HPLC-MS, resulted in a number of compounds being identied via the MassBank website (http://www.massbank.jp). These were moracin-C, 3-(3-hydroxyphenyl) propionic acid, chalcomoracin, dimethyl azelate, nonanedioic acid dimethyl ester and 9-hydroxyimino-6-methyl-4-oxo-6,7,8,9-tetrahydro-4H-pyrido(1,2-A) pyrimidine-3-carboxylic acid ethyl ester ("HMOTPPCAEE"). However, moracin-C and chalcomoracin were found to be antibacterial compounds and could be as a result of drugs taken by CD and IBS sufferers. 56 Moracin-C was also found in IBS versus control. Of great interest, 3-(3-hydroxyphenyl) propionic acid, dimethyl azelate, nonanedioic acid dimethyl ester and "HMOTPPCAEE" could be potential biomarkers for CD (in urine via HPLC-MS) since no occurrences were identied in the control, IBS and UC samples.
Propionic acid had also been observed to be statistically signicant in the faecal samples of patients presenting with CD 57 which suggests that this could be a key metabolite since observed in both urine and faecal samples. There had also been increases in the concentrations of alcohol and ester derivatives of indole and some short-chain fatty acids such as 3-methyl butanoic acid in CD compared to UC and IBS. 57 Furthermore, a key problem is in differentiating between Crohn's disease and ulcerative colitis. They present similarly but have different treatments. These metabolic proling techniques should be used in conjunction with clinical symptoms. Other common conditions of the gastro-intestinal tract such as IBS may also be differentiated 43,57 but generally have less severe symptoms. Moreover, many gastrointestinal diseases have similar symptoms such as pain, diarrhoea and weight loss, but very different pathology which may be reected in separate biomarkers which are thus of potential diagnostic value. Lastly, a study in 2013 further illustrated that 25% of cases of CD do not get a diagnosis until two years have elapsed 58 therefore highlighting the need for a more rapid diagnosis.

Conclusions
The comparison of the three instrumental techniques for the diagnosis of Crohn's disease (CD) using urine as the analyte matrix indicated that HPLC-MS was the best for distinguishing CD sufferers from healthy controls. Nevertheless when IBS and UC patients were included into the subject matrix together with healthy controls, GC-MS appeared to be the best method. However, the misclassication of the IBS, UC and healthy controls was taken into consideration (Table 3), it is possible that HPLC-MS is superior. SIFT-MS and GC-MS analyses were not sufficiently accurate, with unacceptably low sensitivities. These methods analyse VOCs, whereas HPLC-MS analyses metabolites in solution. The results obtained using HPLC-MS imply that the metabolites in solution are better indicators of CD than the volatile compounds present in urine headspace. Previous work has shown that VOCs in the headspace of faecal samples may be used in differentiating CD from UC and other IBDs, 43,57 but use of urine headspace is less efficient as a means of classication.
The typical accuracy of the "gold-standard method" of colonoscopy at the time of writing was 79%. Though the overall classication accuracies reported in this work did not exceed this value (e.g. 73% via HPLC-MS for CD vs. healthy control) it does suggest that urine could become a suitable matrix for the non-invasive diagnosis of CD. The gold standard for all gastrointestinal diseases remains endoscopy and the histological examination of tissue biopsies. Detection of specic biomarkers may help focus accurately the investigations required saving both time and expense.
This manuscript covered the potential for using this combination of analytical instrumentation with multivariate statistics for disease diagnosis. Further work would concentrate on validating this technology, and then the diagnostic potential would be in rolling this approach out in clinics, where it is oen difficult to diagnose Crohn's disease except via endoscopy or sigmoidoscopy.