Structural mass spectrometry of tissue extracts to distinguish cancerous and non-cancerous breast diseases

Breast cancer is well-known to broadly impact cellular metabolism and aberrant metabolism in breast cancer tumors has been widely studied by both targeted and untargeted analyses to characterize the affected metabolic pathways. In this work, we utilize ultra-performance liquid chromatography (UPLC) in tandem with ion mobility-mass spectrometry (IM-MS), which provides chromatographic, structural, and mass information, to characterize the aberrant metabolism associated with breast diseases such as cancer. In a double-blind analysis of matched control (n=3) and disease tissues (n=3), tissues were homogenized, polar metabolites were extracted, and the extracts were characterized by UPLC-IM-MS/MS. Principle component analysis revealed a strong separation between disease tissues, with one diseased tissue clustering with the control tissues along PC1 and two others separated along PC2. Using post-ion mobility MS/MS spectra acquired by data-independent acquisition, the features giving rise to the observed grouping were determined to be biomolecules associated with aggressive breast cancer tumors, including glutathione, oxidized glutathione, thymosins β4 and β10, and choline-containing species. Pathology reports revealed the outlier of the disease tissues to be a benign fibroadenoma, whereas the other disease tissues represented highly metabolic benign and aggressive tumors. This IM-MS-based workflow bridges the transition from untargeted metabolomic profiling to tentative identifications of key descriptive molecular features using data acquired in one analysis, with additional experiments performed only for validation. The ability to resolve cancerous and non-cancerous tissues at the biomolecular level demonstrates UPLC-IM-MS/MS as a robust and sensitive platform for metabolomic profiling of tissues. Electronic Supplementary Material (ESI) for Molecular BioSystems. This journal is © The Royal Society of Chemistry 2014


Introduction
Among women in the United States, breast cancer is the most prevalent type of invasive cancer, affecting 118.7 women per 100 000 in 2010, the most recent year for which statistics are available. 1 Although it is currently the second leading cause of cancer deaths, the mortality due to breast cancer has steadily decreased from 26 women per 100 000 in 2001 to 21.9 in 2010. 1 The general decrease in breast cancer-related mortalities is in part due to the discovery of diagnostic and prognostic markers, and the subsequent development of targeted cancer therapies born of extensive genomic and proteomic profiling endeavors. 2,3 Despite these advances, there still remain breast cancer subtypes, such as triple negative breast cancer (named so due to the absence of markers HER2, PR, and ER), which do not respond to the currently available targeted therapies. 4,5 Metabolomics approaches have also been applied in efforts to discover molecular differences between tumor and healthy cells. As the downstream endpoint of changes in the genome or proteome, metabolites best represent the cellular phenotype while also reflecting environmental influences. 6,7 Cellular metabolism is significantly altered in the transformation of healthy cells into malignant cells, likely due to the rapid cellular proliferation in cancer. 6 Affected pathways include glycolysis, oxidative phosphorylation, choline metabolism, and protection against reactive oxygen species, giving rise to a metabolic phenotype common to cancers in general. 6,8,9 Among the most notable hypotheses of aberrant metabolism in cancer is the Warburg effect, which describes cancer cells' preferential use of glycolysis, typically an anaerobic process, to generate energy despite aerobic conditions amenable to oxidative phosphorylation. 6,[8][9][10] Thus, interrogation of the metabolic phenotype of cancer may provide targets for therapeutics designed to decrease tumor viability by disrupting the metabolic drivers specific to tumors. 6 Among the tools typically used to perform cancer metabolomics studies are nuclear magnetic resonance (NMR) and mass spectrometry (MS). [7][8][9] The former offers a number of tailored approaches, from proton ( 1 H) and carbon ( 13 C) NMR to phosphate ( 31 P) NMR which may be used to monitor pathway-specific highenergy phosphate metabolites. 11,12 Analysis of solid tissues is feasible with magnetic resonance spectroscopy (MRS) approaches such as high resolution magic angle spinning (HR-MAS). 13,14 These techniques are nondestructive, which is an advantage when working with limited volumes of tissue; however, they are generally less sensitive than MS-based approaches. 14,15 Chromatographic separations, such as gas chromatography (GC) and ultra-performance liquid chromatography (UPLC), are frequently combined with MS to increase peak capacity, reduce ionization suppression effects, and reduce mass spectral complexity. However, the derivatization necessary for GC-MS analyses can be challenging for untargeted analyses of complex biological samples, as there is no singular derivatization process which is applicable to all classes of biomolecular species due to their chemical diversity. MS-based cancer metabolomics are amenable to a variety of sample types including serum, plasma, urine, and tissues. 15 These studies may be targeted or untargeted, where targeted approaches aim to measure a predefined subset of molecules based on their class or pathway and untargeted approaches aim to observe as many metabolites as possible without bias. 16 Ion mobility-mass spectrometry (IM-MS) is a hybrid twodimensional technique which combines the gas-phase structural separation of IM with mass-to-charge (m/z) separation of mass spectrometry. Briefly, IM separation occurs as ions travel through a drift cell containing a neutral buffer gas, such as helium or nitrogen, under the influence of a static or dynamic electric field. As the ions traverse the drift cell, they experience a number of collisions with the neutral buffer gas dependent on the collision cross section (CCS), or the effective ion surface area. This process results in a characteristic ion mobility drift time in the range of micro-to milliseconds, which can be used to calculate CCSs and determine coarse structural information. 17 IM-MS-based analyses have been demonstrated for fields ranging from proteomics 18,19 to systems biology. 20,21 For metabolomics analyses of complex biological samples, [22][23][24][25] IM-MS offers unique advantages relative to MS-only approaches. IM-MS platforms are highly flexible, allowing a range of pre-ionization separations, such as UPLC, to be combined in tandem with the IM-MS experiment. Relative to UPLC-MS approaches, UPLC-IM-MS provides even greater peak capacity without increasing the time of analysis due to the complementary time scales of minutes, milliseconds and microseconds for the UPLC, IM, and MS dimensions, respectively. 18,19,[26][27][28] For classes of biomolecules, such as lipids, proteins, and carbohydrates, a correlation is observed between mass and size, or m/z and CCS, based on their gas-phase packing efficiencies. In an IM-MS experiment, this correlation results in the separation of each biomolecular class into unique regions, or trendlines, in IM-MS conformation space. [29][30][31] The separation of biomolecular classes in IM-MS conformation space enables a more holistic approach with minimal sample preparation for the analysis of complex biological samples, where multiple species of biomolecules may be present. 32 Rather than performing sample purification strategies to isolate a particular class of biomolecules, the molecules of interest (e.g. metabolites) can be separated from other biomolecular classes in the IM dimension, effectively increasing the signal-to-noise (S/N). For metabolomics, this is particularly advantageous as isobaric species may be resolved in the IM dimension based on the differences in their gas phase structures. Lastly, data independent acquisition of MS/MS spectra post-ion mobility enables multiplexed fragmentation experiments to be performed nearly simultaneously, and minimizes the need to perform additional experiments to obtain fragmentation data.
Here, we describe an UPLC-IM-MS/MS approach to the characterization of the metabolites differentially expressed in breast diseases. Based on a previously demonstrated workflow for the analysis of diabetic wound fluid, 32 disease (n = 3) and control (n = 3) breast tissues were homogenized, polar metabolites were extracted and characterized by UPLC-IM-MS/MS. 33 Although it was known which tissues represented healthy and disease, the study was double-blind to the exact pathologies of the tissues. Principal component analysis (PCA) revealed unexpected grouping of the breast tissues. Features giving rise to this separation in the PCA were interrogated to determine tentative molecular identifications based on chromatographic retention time, accurate mass, drift time, and fragmentation analysis. While this study does not present enough statistical power to draw broad conclusions about the identified features, the general workflow highlights the benefits of incorporating IM-MS into untargeted metabolomics pipelines with particular emphasis on its utility in the identification process.

Results and discussion
The workflow demonstrated here for the UPLC-IM-MS/MS analysis of human tissues is based, in part, on a previously developed workflow to transition from wholly untargeted to targeted analyses of complex biological samples in the pursuit of identifying key biomolecular features distinguishing disease and healthy conditions. 32 The general workflow demonstrated here is illustrated in Scheme 1, where this methodology has been adapted to include the additional online UPLC separation, methods for chromatographic peak picking and alignment, post-mobility data-independent acquisition of MS/MS spectra, and inclusion of MS/MS spectra, in addition to accurate mass, as a parameter for generating tentative identifications.
Data representative of the UPLC-IM-MS analysis of sample 4D is presented in Fig. 1. The IM-MS plot ( Fig. 1(a)), shows the dimensions of m/z on the x-axis and drift time (ms) on the y-axis. Intensity, measured as counts, is depicted as a false color scale, where white indicates high intensity signals, blue represents low-to-medium intensity signals, and black indicates the absence of signals. A portion of the UPLC chromatogram from the analysis of 4D is presented in Fig. 1(b). User-defined regions of the multidimensional dataset may be selectively extracted to isolate species of interest and effectively increase the S/N by separating those species from chemical noise. 32 Fig. 1 demonstrates the extraction of two acetylated polypeptides, thymosins b4 and b10, in both the chromatographic ( Fig. 1(b)) and drift Scheme 1 Illustration of the workflow for the preparation and extraction of breast tissues, including the steps necessary to transition from a multidimensional dataset to the identification of statistically-prioritized molecular features. time dimensions ( Fig. 1(a)). The chromatographic peak containing thymosins b4 and b10, as indicated in Fig. 1(b) by the grey bar, may be extracted to yield the IM-MS plot containing the polypeptides and any co-eluting species. Likewise, a user defined area (outlined in white in Fig. 1(a)) of m/z-drift time space, or conformation space, containing the polypeptides may be extracted to provide an extracted ion chromatogram for thymosins b4 and b10. Extraction in both the chromatographic and ion mobility dimensions yields a mass spectrum (Fig. 1(c)) of the polypeptides (highlighted in grey), in which the isotopic distributions (inset) of the multi-charged species are sufficiently resolved to determine their charge state as 7+ with high S/N through this multistep filtering of chemical noise. This strategy may be applied in both the analysis of IM-MS and post-mobility-MS/MS spectra, where extraction in chromatographic and IM dimensions effectively increases confidence in tentative identifications by reducing isobaric interferences. 31,32,34 Although the presence or absence of molecular features may be visually observed from the IM-MS plots representing different disease statuses (i.e. disease vs. control), multivariate statistical analyses are required to detect more subtle variations in the expression of molecular features as a function of disease status. However, utilization of drift time in the initial peak picking and alignment remains a challenge for biostatistical tools. Therefore, peak picking and alignment of data resulting from the analysis of the breast tissue extracts was performed at the chromatographic level and the ion mobility dimension was returned to as an aid in generating or filtering putative identifications of statistically significant molecular features on the basis of postmobility MS/MS.
Results from the statistical analysis of the breast tissue extract dataset are summarized in Fig. 2. As described above, samples were injected in three batches, where each batch contained one replicate injection per sample in a randomized order, and the set of replicates was bracketed by QC samples (for PCA with QC set, see ESI †). The PCA score plot ( Fig. 2(a)) provides an overview of the samples. Grouping of the technical replicates for each sample, as seen in the score plot, indicates good reproducibility throughout the analysis. Samples forming a matched pair of disease and control tissues are identified with the dashed ellipses on the score plot ( Fig. 2(a)). Generally, 4D and 6D separated from the other samples along principal component (PC) 1, but were separated from each other along PC2. Interestingly, 2D grouped with the control tissues. This suggested that although 2D represented a tissue affected by disease, this tissue was generally more metabolically normal than the pathologically abnormal and metabolically dysregulated tissues 4D and 6D. The initial hypothesis as to the location of 2D in the PCA score plot was that this tissue represented a benign breast disease, whereas 4D and 6D potentially represented more aggressive diseases.
The corresponding loadings plot ( Fig. 2(b)) was investigated to determine the specific molecular features giving rise to the separations observed in the PCA score plot. In general, the loadings plot presents the coefficients or weights assigned to each variable in the process of generating the principal components. The values of the PC1 and PC2 loadings for a particular feature are representative of the correlation between the respective PC and the feature. For example, feature 16 in Fig. 2(b) has a loading of 0.09 for PC1 and a loading near zero (0.01) for PC2. This indicates that PC1 is highly correlated with feature 16 and PC2 is poorly correlated with feature 16. Similar to interpretation of the PCA score plot, features which group together in the loadings plot demonstrate similar behavior.
The loadings ( Fig. 2(b)) and score plots ( Fig. 2(a)) may be compared to understand the relationship between the molecular features, shown in the loadings plot, and the tissue samples, shown in the score plot. The histogram shown in Fig. 2(c) provides the average normalized abundances of several of the representative molecular features labeled in Fig. 2 Fig. 2(b) were selected to represent the molecules most strongly directing the loadings of samples 4D and 6D, and a few features which best represented the collective similarities of the control tissues and sample 2D. Pairwise comparisons between the matched disease-control samples were also performed via orthogonal partial least squares-discriminant analysis (OPLS-DA; see Fig. S3 to S6 in the ESI †) to maximize the group differences. [35][36][37][38] Fold-changes for the data shown in Fig. 2(c) may be found in Table 1, along with the m/z and retention time associated with the feature.
The raw data was then revisited to determine tentative identities of the differentially abundant molecular features. Upon investigating feature 4, m/z 308.09, the extracted ion chromatogram indicated this species was found in two chromatographic peaks (retention times 2.47 and 3.78 min). While the peak at 2.47 min contained m/z 308.09 as the base ion, the chromatographic peak at 3.78 min contained a base ion of m/z 613. 16. Closer inspection of the mass spectrum around m/z 308.1 revealed that there were overlapping isotopic envelopes present ( Fig. 3(b)). Analysis of the IM-MS spectrum using the same approach described above revealed that there were in fact two species contributing to the spectrum in Fig. 3(b). In contrast to one dimensional MS analyses, these overlapping isotopic distributions can be easily resolved in the IM dimension as the separation occurs on the basis of size-to-charge, where the general trend is that higher charge state species have shorter drift times owing to greater mobilities at a given electric field strength. 17,31,34 This can be seen in Fig. 3 the horizontal blue dashed lines allows the peak inside the blue box to be extracted away from the isobaric interference. This yields the spectrum presented in Fig. 3(c), from which it is evident that this species, m/z 308.09 is singly charged. Performing the same steps for the region outlined in Fig. 3(a) with green dashed lines provides a mass spectrum Fig. 3 The post-mobility fragmentation spectra for m/z 308.09 and m/z 613.16 were studied to assess the plausibility of the potential identifications of reduced and oxidized glutathione as features 4 and 5, respectively. The chromatographic peak containing m/z 613.16 at 3.78 min was extracted to provide the IM-MS spectrum of the data independent MS/MS acquired post-mobility separation, shown in Fig. 4(a). Multiple regions of fragmentation were evident in the IM-MS spectrum, as outlined by the green and blue lines. As fragmentation occurs after mobility separation, the fragment ion should retain the drift time of the corresponding precursor ion, thus generating a horizontal line of precursor and fragment ions (regions outlined by green and blue lines). 39 Effectively, this mobility organization of fragments circumvents complications which may arise from the presence of isobaric species and eliminates the requirement of mass selection prior to MS/MS experiments. Similarly, fragmentation which occurred prior to IM can be discerned from the post-mobility fragmentation as each ion, regardless of whether it is a precursor or fragment, will have a unique drift time. This may arise in situations where in-source fragmentation has occurred, and such a phenomenon is indicated by the white lines in Fig. 4(a). In order to obtain a true fragmentation spectrum of the intact Combining the defined window about m/z 308 with a discrete window of drift times (blue dashed lines) allows the peak in the blue box to be extracted (c) away from the isobaric interference (green dashed lines, d). The end result is separate mobility-extracted mass spectra resolving the isotopic distributions to that of a singly-charged m/z 308.1 (feature 4) and doubly charged m/z 307.1 (feature 5). Table 1 Tentative identifications, mass accuracy, and fold-changes of the features highlighted in loadings plot (Fig. 2(b) Fig. 4(c), and confirms the proposed identification as there is a peak-topeak match between the experimental and standard. A similar process was performed to obtain the identifications shown in Table 1, and standards were used to validate IDs where possible (noted in Table 1). Although tissue 4D is the only sample to reside in the lower left quadrant of the PCA (Fig. 2(a)), a number of the more significant molecular features (No. 1-11) are located in the corresponding quadrant of the loadings plot ( Fig. 2(b)). Interestingly, many of these features were identified as biomolecular species previously demonstrated to be differentially expressed in the tumor environment. For example, thymosins b4 and b10 (features #7-9) are highly conserved, highly abundant polar polypeptides which are overexpressed in a number of tumor types, including breast cancer. 40 A primary function of thymosin peptides is to bind G-actin, the primary component of the cellular cytoskeleton, and inhibit G-actin polymerization. 40 As actin sequestration increases the motility of the cell, thymosin polypeptides have been suspected to play a key role in the processes of cell migration and tumor metastasis. [40][41][42] Characterization of the effects of thymosin b4 overexpression on lung tumor metastasis revealed increases in tumor sizes, number of metastatic nodules, neoangiogenesis, and cell migration, strongly suggesting that thymosin b4 stimulates tumor metastasis. 41 In the analysis described here, thymosin b4 and b10 were found to be increased greater than 10 and 20-fold (based on single peak in isotopic distribution of multiple charged species), respectively, between tissue 4D and its corresponding control 3C (p-values: b10 (#7), p = 7.3 Â 10 À7 ; b10 (#8), p = 2.9 Â 10 À7 ; b4 (#9), p = 1.5 Â 10 À5 ). A histogram demonstrating the differential expression of thymosins b4 and b10 across the disease and control breast tissues using summed peak areas for the whole isotopic envelope of the 6+ charge states can be found in the ESI. † Using the peak area results, fold-changes of 38 (p = 7.0 Â 10 À7 ) and 75 ( p = 1.4 Â 10 À6 ) were observed between samples 3C and 4D for thymosins b4 and b10, respectively.
Additionally, samples 4D and 6D presented higher levels of glutathione (#4; 4D: 4.8-fold, p = 1.3 Â 10 À5 ; 6D: 4.6-fold, p = 9.9 Â 10 À6 ) relative to the controls, while only 4D showed increased levels of oxidized glutathione (#5; 15.6-fold, p = 1.5 Â 10 À5 ; see Table 1 and Fig. 2(c)). Glutathione (GSH) is the primary intracellular thiol responsible for protection against free radicals. 48 This detoxification may occur directly or in conjunction with the enzyme glutathione s-transferase, which conjugates electrophilic compounds to reducing sulfhydryl (-SH) group of glutathione's cysteine residue. Oxidized glutathione, or glutathione disulfide, is composed of two glutathione units linked through a disulfide bond between the cysteine residues. The relative abundances of reduced and oxidized glutathione (GSSG) in tissues have been examined as an indicator of the redox status of the tissue given the potential to detoxify to cancer drugs which work via oxidative damage. [48][49][50][51] Previous studies of glutathione levels in breast tumors have found significantly elevated levels of reduced and oxidized glutathione in tumor tissues relative to matched peritumoral (i.e. control) tissues. 48,49,51 However, it was observed in the tumor tissues that reduced glutathione levels were significantly greater than oxidized glutathione levels, representing an increase in the detoxification capacity of the tumors. 51 Our results were consistent with these findings, where abundances of GSH were approximately 2-fold greater than GSSG in both 4D and 6D (Fig. 2(c) and ESI †).
For many of the tentatively identified features selectively highlighted in Fig. 2(c) and Table 1, a consistent pattern has been observed in which abundances are increased in 4D and 6D relative to their controls, while 2D demonstrates the opposite correlations despite also representing a tissue affected by disease. Our general hypothesis has thus been that 4D and 6D represented cancerous breast diseases, perhaps differing in malignancies, given the identities of the most significant molecular features and their known involvement in breast cancer. On the other hand, 2D has been suspected to represent a benign cancer or a non-cancerous breast disease. Examination of the pathology reports for 2D, 4D and 6D revealed our initial data-driven hypotheses to be generally accurate. Sample 2D was diagnosed as a fibroadenoma, a benign breast tumor most commonly diagnosed in patients in their early 20s. 52 However, this sample did not represent a typical fibroadenoma diagnosis as the patient from whom this biopsy was taken was 47 years old. Sample 4D, as suspected, was diagnosed as an infiltrating ductal carcinoma of grade 3 and pathological state IIA which was found to be ER, HER2/NEU, and PR negative, often referred to as triple negative cancer. This particular type of breast cancer is challenging to treat as it does not respond to targeted therapies and is often associated with a shorter time between relapse and death. 4 Lastly, the diagnosis of sample 6D was a pseudoangiomatous stromal hyperplasia, a benign breast tumor. 53 Similar to triple negative tumors, pseudoangiomatous stromal hyperplasias are highly metabolic and perhaps this was the primary director for the separation of 4D and 6D away from the other tissues.

Conclusions
Metabolic profiling of breast tissues using the UPLC-IM-MS/ MS-based platform described here was demonstrated to be a highly sensitive and selective technique for the differentiation of breast tissues representing a range of benign to cancerous breast diseases. The ion mobility aspect of the analysis provided an additional dimension of separation orthogonal to that provided in the chromatographic dimension. This enabled simultaneous isolation of features of interest in both IM and LC dimensions, improving confidence in locating features of interest while also increasing their signal-to-noise ratios. In the instance of co-eluting isobaric species, it was demonstrated that IM could effectively separate these species and eliminate the overlapping isotopic peaks which may confound accurate mass determination and subsequent identification. Data-independent tandem MS acquired post-mobility separation provided a means to distinguish fragmentation occurring prior to the mobility and collision cells from that of true collision induced dissociation, providing MS/MS spectra from which feature identifications could be made with high confidence.
The molecular features giving rise to the distinction of cancerous and benign tissues from peritumoral control tissues included species previously well-known in the literature to be affected by the aberrant metabolism observed in breast cancer. These included choline, phosphocholine, glycerophosphocholine, glutathione, and oxidized glutathione. Similarly, our analysis revealed that the actin-sequestering polypeptides thymosin b4 and b10 were differentially expressed between disease and control tissues. A larger sample set including matched biological replicates for each type of breast cancer will be necessary for any conclusions to be drawn about the particular metabolites identified as statistically significant in this study. Power analysis based on the expression levels of choline indicates that a sample set of 18 age, gender and ethnicity matched tissues comprised of 3 triple negative tissues and 3 matched controls, 3 pseudoangiomatous hyperplasia tissues and 3 matched controls, and 3 fibroadenoma tissues and 3 matched controls would provide a statistical power near one for a two-tailed t-test where alpha is equal to 0.01. For the present work, the UPLC-IM-MS/MS platform provided a truly untargeted approach in which features from multiple classes of biomolecules could be utilized to differentiate tissues representing an array of breast diseases from cancerous to benign.

Tissues
Six surgically resected fresh-frozen human tissues were selected from the Meharry Medical College Translational Pathology Shared Resource Core Facility to control for gender, age and The specific pathology of the disease tissues was withheld from the researchers to create a partially blinded experiment, however it was known which three of the six tissues represented a form of breast disease (C, control; D, disease; matched pairs indicated by consecutive numbering (i.e. 1C and 2D are a matched pair)). Tissues were stored at À80 1C.

Sample preparation
The procedure for homogenization and extraction of the human breast tissues was adapted from the methods described by Want et al. for the extraction of polar metabolites from tissues for UPLC-MS analysis. 33,54 Intact tissues were initially coarsely homogenized on ice in a dounce homogenizer, and 47 AE 4 mg (wet) of each tissue was transferred to an eppendorf tube. To each tissue, 1 mL of cold 1 : 1 methanol/water (v/v) (Chromasolv, Sigma-Aldrich, St. Louis, MO) was added and the samples were further homogenized on ice using a hand-held homogenizer with disposable plastic probes (Omni International, Kennesaw, GA). A fresh disposable probe was used for each sample. An additional 500 mL of cold 1 : 1 methanol/water was added to each sample for a total volume of 1.5 mL, and extraction was allowed to proceed overnight at À20 1C.
Samples were centrifuged (16 500g for 5 min at 2 1C; Heraeus Fresco 21, Thermo Scientific) and the supernatants were transferred to fresh eppendorf tubes. For the UPLC-IM-MS/MS analysis, 750 mL of each supernatant was dried on a speed-vac and reconstituted with 200 mL of H 2 O. Protein precipitation was performed by adding cold (À20 1C) methanol in a 3 : 1 ratio, or 600 mL, to the samples on ice. Samples were then transferred to dry ice for 5 min, after which they were vortexed and centrifuged. Supernatants were transferred to fresh tubes, dried on a speed-vac, and then stored at À80 1C.
Samples were reconstituted with 500 mL of H 2 O with 0.1% formic acid (HPLC-grade, Fisher Scientific), vortexed, and centrifuged. From each sample, 250 mL was transferred to autosampler vials. A pooled quality control (QC) sample was prepared with 30 mL of each sample (taken from the 250 mL), for a total volume of 180 mL.

UPLC-IM-MS/MS
A nanoACQUITY (Waters Corporation, Milford, MA) UPLC system was used to perform chromatographic separations on an ACQUITY HSS C18 column (1.8 mm, 1.0 Â 100 mm; Waters Corporation, Milford, MA). The autosampler and column temperatures were maintained at 4 and 40 1C, respectively, and an injection volume of 6 mL was used to overfill the 5 mL loop. Chromatographic separation was performed with a binary solvent system, where solvent A was 0.1% formic acid in water (Fisher Scientific) and solvent B was 0.1% formic acid in acetonitrile (Chromasolv, Sigma-Aldrich). Gradient conditions for the 25 min run with 60 mL min À1 flow rate were as follows: initial, 99% A; 1 min, 99% A; 6 min, 40% A; 16 min, 1% A; 18 min, 1% A; 19 min, 99% A; 21 min, 99% A.
The UPLC was fluidly connected to a Synapt G2 ion mobilitymass spectrometer (Waters Corporation, Milford, MA) to perform IM-MS/MS detection. The traveling wave IM cell is pressurized with nitrogen gas and the MS is an orthogonal time-offlight mass spectrometer (TOFMS) operated in the single stage reflectron configuration. 34,55 The outflow from the chromatographic system was coupled to the instrument through the electrospray ionization (ESI) source. Conditions for positive mode ESI were as follows: capillary, 3 kV; sampling cone, 40 V; extraction cone, 7 V; source temperature, 80 1C; desolvation temperature, 150 1C; cone gas flow, 20 L h À1 ; desolvation gas flow, 300 L h À1 . Mass calibration was performed with sodium formate in the range of m/z 50-1400, and a leucine enkephalin lockmass signal was continuously acquired throughout the MS acquisition for external mass correction of the data. IM separation was achieved with a traveling wave velocity of 650 m s À1 and height of 40 V. Data independent MS/MS by collision induced dissociation (CID) was performed in the transfer region with collision energies ramping between 10 and 30 V.
The sample queue was prepared to run one set of technical replicates (injections of a sample) in a randomized order, bracketed by QC samples. This was repeated two more times to provide three technical replicates for each sample and four QC injections through the queue to monitor performance.

Biostatistics
Data was first mass corrected using the continuously acquired lockmass signal, and the data was centroided during this process. The corrected data were then processed with Proteo-Wizard (version 3.0.4243) MSConvert to convert the .raw files to .mzXML files. 56 The .mzXML files were processed with XCMS (Scripps, La Jolla, CA) to perform peak picking and peak alignment in the chromatographic domain. 57,58 Briefly, the ''matchedFilter'' algorithm was used for peak picking and peak alignment, and retention time correction was performed with the ''obiwarp'' algorithm. Missing values were filled with the ''fillPeaks'' algorithm. Details of the XCMS processing are provided in the ESI. † The output from XCMS was normalized such that the sum of all intensities within a sample equaled 10 000. This dataset was then imported into Extended Statistics (Umetrics) for visualization of multivariate statistical analyses. PCA was used to determine the quality of the dataset, in terms of the grouping of triplicate technical replicates and grouping of QC injections near the origin of the PCA plot. Model parameters (R2Y and Q2) for the PCA are provided in the ESI. † Analysis of the corresponding loadings plot was used to identify the features contributing to the score of each sample along PC1 and PC2 of the PCA plot. OPLS-DA S-plots were also generated in Extended Statistics to show the pairwise molecular differences between the matched disease and control samples. To determine significance between matched disease and control pairs, p-values were calculated with the Student's t-test for means (two-tailed, equal variance, a = 0.05) using the normalized aligned dataset and corrected by the Bonferroni method to account for multiple testing. ANOVA analyses were performed to determine the significance of the features when compared across all the samples, and the resulting p-values were Bonferroni corrected. For all p-values, p r 0.05 was used as the threshold for significance.

Bioinformatics
For the statistically prioritized molecular features, values of m/z were retrieved from the lockmass-corrected .raw files to ensure the best mass accuracy. In addition, the IM-MS spectra containing the post-mobility data-independent MS/MS acquisitions (saved as function 2 of the data file) were accessed and mobility-organized fragmentation spectra for the features were extracted. This was performed by extracting a defined window of retention time containing the chromatographic peak in which the feature eluted, followed by extracting a defined window of drift time across all m/z which bracketed the drift time of the feature in DriftScope (for example, as indicated on Fig. 3 and 4). The extracted drift time peak was then exported to MassLynx, where all the scans across the peak were combined to generate the MS/MS spectrum. When possible, both accurate mass and fragmentation information were used to make tentative identifications from database searches. Databases used for generating tentative identifications by accurate mass included the Human Metabolite Database (HMDB, http://www.hmdb.ca), KEGG (http://www.kegg.com), and METLIN 59-61 (http://metlin. scripps.edu). The METLIN MS/MS spectrum match feature or the MetFrag 62 (http://msbi.ipb-halle.de/MetFrag/) in silico fragmentation tool were used to search the experimental fragmentation spectra peak lists, which were filtered by intensity to include only the top 30 peaks.

Validation
Tentative identifications where validated with standards when possible. The experimental MS and MS/MS spectra, as well as the drift times, were compared against those of the standard. Standards of glutathione, oxidized glutathione, and adenosine 5 0 -monophosphate were purchased from Sigma-Aldrich (St. Louis, MO). Standards were prepared at concentrations from 1-3 mg mL À1 in 1 : 1 methanol/water with 0.1% formic acid. Each standard was directly infused into the ESI source of the Synapt G2 with an external syringe pump (10 ml min À1 flow rate) with ionization conditions identical to those described above. Post-mobility fragmentation was performed at collision energies between 10 and 30 V to approximate the conditions of the data-independent MS/MS acquisitions.