Chad P.
Satori
a,
Marzieh
Ramezani
a,
Joseph S.
Koopmeiners
b,
Audrey F.
Meyer
a,
Jose A.
Rodriguez-Navarro
c,
Michelle M.
Kuhns
a,
Thane H.
Taylor
a,
Christy L.
Haynes
a,
Joseph J.
Dalluge
a and
Edgar A.
Arriaga
*a
aUniversity of Minnesota, Department of Chemistry, 207 Pleasant Street SE, Minneapolis, MN 55455-0431, USA. E-mail: arriaga@umn.edu; Tel: +1-612-624-8024
bUniversity of Minnesota, Division of Biostatistics, 420 Delaware Street SE, Minneapolis, MN 55455, USA
cAlbert Einstein College of Medicine, Institute for Aging Studies, Marion Besin Liver Research Center, Department of Developmental and Molecular Biology, 1300 Morris Park Avenue, Bronx, NY 10461, USA
First published on 11th October 2016
We report the use of ultra high performance liquid chromatography (UPLC) coupled with acquisition of low- and high-collision energy mass spectra (MSe) to explore small molecule compositions that are unique to either enriched-autophagosomes or secretions of chemically activated murine mast cells. Starting with thousands of features, each defined by a chromatographic retention time, m/z value and ion intensity, manual examination of the extracted ion chromatograms (XIC) of chemometrically selected features was essential to eliminate false positives, occurring at rates of 33, 14 and 37% in samples of three biological systems. Forty-six percent of features that passed the XIC-based checkpoint, had IDs in compound databases used here. From these, 19% of IDs had experimental high-collision energy MSe spectra that were in agreement with in silico fragmentation. The importance of this second checkpoint was highlighted through validation with selected commercially available standards. This work illustrates that checkpoints in data processing are essential to ascertain reliability of unbiased metabolomic studies, thereby reducing the risk of generating ‘false identifications’ which is a major concern as ‘omics’ data continue to proliferate and be used as platforms to launch novel biological hypotheses.
Such data sets require chemometric approaches to expedite the determination of system-specific chemical features.7 While chemometric approaches in concert with database searches have become essential to improving efficiency in mass-spectrometry related studies, there is the likelihood that some selected features may be false positives and/or misidentified due to limitations of the chemometric strategy. Additionally, database IDs based only on mass accuracy measurements of detected precursor ions are often erroneous.8 As such, scrutiny of chemometrics-determined features and subsequent database-identified compounds becomes imperative to assigning reliable preliminary identifications. A suitable chemometric approach for processing of UPLC/MSe datasets is orthogonal partial least squares with discriminate analysis (OPLS-DA) that classifies data features into one of two comparative groups.9–12 Disadvantages of OPLS-DA include the failure to detect selected features that are not well grouped into one of the two groups due to group-averaging effects from discriminate analysis.13 Alternate approaches to using OPLS-DA for detecting selected features include the t-test and linear mixed model (LMM) which compare the average intensity of extracted-ion chromatograms (XICs) between samples.14 However, the LMM is susceptible to model misspecification, which can result in inaccurate choice of selected features and potential omission of some selected features unique to a sample that has increased variation.13 Thus, applying multiple chemometric approaches to a given data set can compensate for the limitations of a single chemometric approach and increase coverage of features unique to a comparative group.
Despite the advantages of LC/MS-based approaches combined with chemometrics for metabolite profiling studies, there are other potential experimental pitfalls.8,15,16 For instance, slight changes in chromatographic retention times can cause misalignment of sample-specific features, resulting in variations in ion suppression of chemical species due to co-eluting compounds, which ultimately leads to inaccurate assessment of the relative abundances of analytes.
The numerous caveats regarding chemometric analysis, database searching, and experimental variations summarized above make additional evaluation of both chemometric-selected features and database IDs imperative. As such, here we demonstrate the importance of validation of (1) chemometric selected features using XICs, and (2) comparison of the experimental high-collision energy MSe spectra with in silico fragmentation of tentative ID's from database searches. Application of these two checkpoints to define preliminary identifications from comparative samples were 100% correct when compared against commercially available standards. Here, we used these checkpoints in the processing of data resulting from the UPLC/MSe analysis of enriched autophagosomes and secretions of activated mast cells, which lack prior reports on unsupervised metabolomic analysis. Autophagosomes are organelles involved in autophagy, a process that fails in multiple diseases including Parkinson's disease,17–19 Huntington's disease,17 and Alzheimer's disease.19,20 Chemically stimulated mast cells play an important role in allergic/inflammatory response.21 Thus, having passed two orthogonal checkpoints, the preliminary identifications described here, constitute potentially important molecules in autophagosomes and secretions from activated mast cells. Overall, we recommend use of these checkpoints in the initial testing of (unsupervised) analysis of mass spectrometric datasets, which ultimately results in improved molecular characterization of biological systems.
Four pure chemical standards were purchased from Avanti Polar Lipids. 1-Palmitoyl-2-hydroxy-sn-glycero-3-phosphocholine and 1-stearoyl-2-hydroxy-sn-glycero-3-phosphocholine were prepared in 1:1 HPLC-grade MeOH:H2O to the final concentration of 50 μg mL−1 and vortexed for 1 minute. 1-Hexadecanoyl-sn-glycero-3-phosphoethanolamine, was dissolved in CHCl3/MeOH/H2O (8:4:1) mixture (C = 10 mg mL−1). One μl of the solution was transferred into 200 μL 1:1 HPLC-grade MeOH:H2O to the final concentration of 50 μg mL−1 and vortexed for 1 minute. D-erythro-sphingosine (C17 base) was transferred into 1:1 HPLC-grade MeOH:H2O so that the final concentration of the sample was 50 μg mL−1 and vortexed for 1 minute. Standards samples were added to mass spectrometry sample tubes (MS conditions: polar positive).
Standards were then spiked from their stocks into post-nuclear fraction of L6 cells (25000–100000 cells) prepared with nitrogen cavitation. Standards were spiked at a final concentration of 50 μg mL−1. Samples were treated with 1.0 mL MeOH:H2O. (1:1 v/v) and the pellet was resuspended and vortexed for 1 min. Samples were centrifuged for 10 min, at 16100 ×g. The supernatant was removed and dried with speed vac. Remaining pellet was treated with 1.0 mL DCM:MeOH (1:3 v/v), and the pellet was resuspended and vortexed for 1 min. Samples were then centrifuged for 10 min, at 16100 ×g. Supernatant was removed and dried with vacuum centrifugation at 24 °C for ∼36 h. Samples were resuspended in 225 μL HPLC-grade MeOH:H2O. (1:1 v/v), vortexed for 1 min, sonicated for 10 min, and centrifuged for 10 min at 16100 ×g. Supernatant was collected, and the samples were injected for mass spectrometry analysis.
A Waters Acquity UPLC coupled to a Waters Synapt G2 HDMS quadrupole orthogonal acceleration time of flight mass spectrometer was used for UPLC/MSe analysis. The reversed-phase column used was a Waters HSS T3 C18 2.1 mm × 100 mm column (1.7 μm diameter particles) operated at 35 °C. The following 28 min linear gradient separations were employed at a flow rate of 0.40 mL min−1 using a mobile phase consisting of A: water containing 0.1% formic acid and B: acetonitrile containing 0.1% formic acid. The gradient profile for samples from polar extractions was: 3% B, 0 min to 5 min; 3% B to 97% B, 5 min to 18 min; 97% B, 18 min to 21 min; 97% B to 3% B, 21 min to 23 min; 3% B 23 min to 28 min. The gradient profile for nonpolar extractions was: 30% B, 0 min to 5 min; 30% B to 97% B, 5 min to 18 min; 97% B, 18 min to 21 min; 97% B to 3% B, 21 min to 23 min; 3% B, 23 min to 28 min. Dead time was 0.68 min for the polar separation and 0.54 min for the non-polar separation as determined by injection of acetone. Simultaneous low- and high-collision energy (CE) mass spectra were collected in centroid mode over the range m/z 50–1200 every 0.1 s during the chromatographic separation. MSe parameters in positive electrospray ionization mode were as follows: capillary, 2.0 kV; sampling cone, 35.0 V; extraction cone, 5.0 V; desolvation gas flow, 800 L h−1; source temperature, 100 °C; desolvation temperature, 350 °C; cone gas flow, 20 L h−1; trap CE, off (low CE collection), trap CE ramp 15–65 V (high CE collection); lockspray configuration used the average of three m/z measurements (0.2 s scan, m/z 100–1200, every 10 s) of protonated leucine-enkephalin (m/z 556.2771) formed from infusion of a 5 μg mL−1 solution; this configuration typically yields mass accuracies <2 ppm. All MSe parameters were identical in negative ionization mode except the following: capillary, 2.5 kV; sampling cone, 30.0 V; extraction cone, 4.0 V.
Fig. 1 Workflow for determination, confirmation, and preliminary identification of features from UPLC/MSe data. |
Chemometrics analysis was used for determination of candidate, system-specific features from UPLC/MSe data (Fig. 1). The LMM and t-test approaches were implemented in the R programming language using a home-written script and OPLS-DA was included in MarkerLynx™ from Waters.26 Comparison of the relative abundance of ions, as measured by XIC intensity, in either the control or enriched fraction for the autophagosome samples or in either the control or stimulated fractions for the mast cell sample was used to select features with LMM or t-tests (Fig. 1). Experiments with no biological replicates but with three instrumental replicates (autophagosomes from rat liver tissue) were analyzed using the two-sample t-test with unequal variances. Experiments with both three biological replicates and three instrumental replicates each (autophagosomes from rat myoblast skeletal muscle cell culture and activated mast cells) were analyzed using LMM.14 Analysis for the t-test and LMM was performed on the log-transformed scale, and differences between biological samples were described by the ratio of geometric means. Multiple comparisons were accounted for by controlling the false discovery rate, which was calculated using the approach described in Benjamini and Yekutieli.27 For autophagosome-enriched samples, data features with a false discovery rate of 10% or less were deemed selected features. For activated mast cells, which had at least 10-fold greater number of features, features with a false discovery rate of 1% or less were deemed selected features for Checkpoint 1 (Fig. 1).
Selected features with OPLS-DA were those at the edges of the OPLS-DA-generated S plot (Fig. 2). An S plot is one option available for visualization of OPLS-DA data.28 The enrichment of the preliminary feature in a biological system is plotted on the x-axis and correlation of enrichment is plotted on the y-axis. Features that were >|0.001| for coefficient 2 (x-axis) and >|0.90| for correlation (y-axis) were deemed selected features (Fig. 1). The number of selected features in each sample is summarized in Table S1.†
All selected features by any of the chemometric approaches described above were examined at Checkpoint 1 (Fig. 1). At this checkpoint, the XICs from the low-collision energy mass spectra for a given m/z value with a 5 ppm mass tolerance were examined. A selected feature was rejected when the XIC did not have true chromatographic peak profile. Candidate features were those that passed Checkpoint 1 (Fig. 1). The number of candidate features that passed this checkpoint in each sample is summarized in Table S1 (ESI†).
Online database searches of candidate features made it possible to assign Database IDs to such features (Fig. 1). Online databases included the Chemical Entities of Biological Interest, Human Metabolome Database, Lipid MAPS, and ChemSpider. Although other databases are available (e.g. MyCompoundID, MZmine, Massbank, METLIN, mz Cloud),29–31 we focused on using those who were immediately accessible to us. Searches for the neutral mass values corresponding to [M + H]+ and [M − H]− ions were done with mass error <10 mDa (or <11 ppm). This is an acceptable mass accuracy based on recently published metabolomics-based reports,32–35 and would be a small concern because of the subsequent checkpoint described below. Due to the large number of candidate features for activated mast cells, the top 250 selected features for the LMM and all 65 selected features for the OPLS-DA were selected for database searches. Results from the database searches are summarized in Table S1 (ESI†).
Features with database IDs were evaluated at Checkpoint 2 (Fig. 1). Checkpoint 2 consists of (1) XIC alignment of co-eluting ions to verify the presence of putative precursor ions corresponding to the database-identified species; (2) corroboration of identity by comparison of fragmentation patterns observed in the high-collision energy mass spectra of each precursor with simulated fragmentation patterns calculated in silico using Waters MassFragment™ software; and (3) manual precursor-product XIC alignment of precursor and fragment ions. The MassFragment score system to predict fragmentation (i.e. low score) used the default scoring system as follows: the 20 most intense m/z species from the high-collision energy mass spectra were compared to theoretical fragments with a tolerance of 10 mDa. Double bond equivalence values were between −10 to 50, electron count was set to “both”, maximum H deficit was 6, fragment number of bonds was 4, and scoring parameters were for aromatic (6), multiple (4), ring (2), phenyl (8), other (1), H-deficit (0), hetero modifier (0.5), alpha penalty (5), and maximum score (16). Preliminary identifications supported by the evaluation criteria described here are reported using their database ID number36 and are summarized in Tables S1 and S2 (ESI†).
Among the 20 preliminary identifications was the compound 1-octadecanoyl-glycero-3-phosphocholine (LMGP01050026). The compound's parent [M + H]+ ion (Fig. 3A) shows aligned peaks in the low- and high-collision energy extracted ion chromatograms (Fig. 3B and C, respectively). A trend plot for the observed precursor ion indicates the relative abundance of this compound in the autophagosome-enriched fraction (Aps, Fig. 3D) versus the non-enriched fraction (control, ctl, Fig. 3D). Trend plot data was used to calculate the fold-enrichment reported in Table S2.† Despite the complex nature of the low- and high-collision energy mass spectra (Fig. 3E and F, respectively) caused by other molecules with overlapping retention times to this compound, a theoretical fragment ion generated in silico using MassFragment™ matched an observed fragment ion (Fig. 3G and F). Other ions present in the spectra that could confound the assignment did not interfere, and the XIC of this fragment matched the XIC of the parent ion, confirming the preliminary identification of the parent ion (Checkpoint 2, Fig. 1).
To validate the identification of LMGP01050026 described in the previous paragraph we also ran a commercial standard and the liver autophagosome-enriched sample spiked with the standard (Fig. 4). The parent ion mass (m/z 523.3638), the match between the XIC (m/z 523.3638; TR = 15.98 min) of the sample/spiked-in standard and the standard alone (Fig. 4B and C, respectively) and match the main peaks in the low and high-collision energy mass spectra of the sample/spiked-in standard and the standard alone (Fig. 4D and E, respectively) are strong evidence for identification of LMGP01050026 in the liver autophagosome-enriched sample. Furthermore, a peak with m/z 506.36 related to a fragment formed by loss of a water molecule, used for the high-collision energy XIC was also present in the spectra obtained with the use of standards (Fig. 4D and E). Lastly, other fragments supported the structural identification (m/z 184.07 defines the loss of C21H40O3 group; m/z 104.10 represents a fragment formed by losing a C21H41O6P group).
Fig. 4 Confirmation of identification of m/z 523.3638 (1-stearoyl-2-hydroxy-sn-glycero-3-phosphocholine) described in Fig. 3. (A) Chemical structure; (B) low and high-collision energy XIC for m/z = 523.3638; (C) low and high-collision energy XIC for the standard m/z = 523.3638; (D) low and high-collision energy mass spectrum at TR = 15.98 min; (E) low and high-collision energy mass spectrum of the standard at TR = 15.98 min. |
Identification of LMGP01050026 as an enriched glycerophospholipid in autophagosomes is also consistent with other preliminary identifications which included lysophospholipids such as LysoPC (22:5(7Z,10Z,13Z,16Z,19Z)) (HMDB10403), and LysoPC (18:2(9Z,12Z)) (HMDB10386) (Table S2, ESI†). These types of lipids are involved in membrane fusion and elongation in macroautophagy37 and alteration of these lysophospholipids causes disruption of autophagy-related organelle membranes by modifying lipid biosynthesis.38
Beyond LMGP01050026, among the other 19 preliminary identifications that passed Checkpoint 2 (Table S2 and Fig. S7a, ESI†), two had commercially available standards: 1-palmitoyl-2-hydroxy-sn-glycero-3-phosphocholine (HMDB10382) and 1-hexadecanoyl-sn-glycero-3-phosphoethanolamine (LMGP02050002). For validation of their respective preliminary identifications we also ran separately the commercial standards and the liver autophagosome-enriched sample spiked with each of standards (Fig. S8 and S9, ESI†). Their parent ion masses, the match between the XICs of the sample/spiked-in standard and the standard alone, match the main peaks in the low and high-collision energy mass spectra of the sample/spiked-in standard and the standard alone, and the fragmentation patterns (Fig. S8 and S9, ESI†) support the identification of HMDB10382 and LMGP02050002 as enriched compounds in the liver autophagosome-enriched sample.
The degree of confidence for preliminary identifications, sans validation with standards, increases when there are reports on the roles that such compounds play in autophagy. Preliminary identification of PE (16:0/0:0) (LMGP02050002) is of interest because it is a member of the phosphatidylethanolamine (PEA) family. PEA is a critical factor in autophagy due to its conjugation with Atg8 in the protein complex required for autophagosome formation.39,40 Sphinganine (HMDB00269), a sphingolipid base, represents another compelling preliminary identification as sphingolipids have previously been shown to stimulate macroautophagy41,42 and accumulate in biological systems such as Niemman Pick C disease that also accumulate autophagosomes.43,44 Autophagy may also play a role in vitamin D regulation, which makes intriguing the preliminary identification of 1α,23-dihydroxy-24,25,26,27-tetranorvitamin D3 (LMST03020020), a vitamin D3 metabolite.45
We used two of three chemometric/statistical approaches (OPLS-DA, t-test, and LMM) to select unique features (Fig. 1). The OPLS-DA was used in the three studies discussed here. The t-test was useful when only one technical replicate was done (e.g. due to rat liver sample limitations). The LMM was useful when three technical replicates were carried out. Because only 17% of the candidate features were present in the two chemometric/statistical analyses used for each specific study (see % CCF in Table S1, ESI†), the combined outputs of the chemometric/statistical approaches were used for each study. This improved the number of final preliminary IDs (43 in total), in which 23 of them were initially selected by only the OPLS-DA or only the LMM/t-test approach (Table S2, ESI†).
Checkpoint 1 was essential to eliminate selected features with inadequate XICs (false positives). Those that passed checkpoint 1 (candidate features) were 67%, 86% and 63% of selected features in the liver, myoblast, and secretions of activated mast cells, respectively (Table S1, ESI†). Thus, the respective rates of false positives were 33%, 14%, and 37%. We cannot attribute this rate of false positives to S/N issues. With signal threshold of 100 counts and a peak-to-peak baseline noise of 8.0 counts, values for S/N were greater or equal to 12. Indeed, once the candidate features are known, one could retroactively optimize MarkerLynx parameters (e.g. mass tolerance and retention time tolerance) to reduce the initial number of data features until the number of false positives is minimized. Indeed, this optimization would be impractical without the knowledge of true positives that passed Checkpoint 1 in a given sample. Future investigation on the optimization of these parameters may reduce the rate of false positives in a sample-specific manner, thereby reducing the manual effort spent on applying Checkpoint 1.
A total of 46% of the candidate features searched in chemical databases (Chemical Entities of Biological Interest, Human Metabolome Database, Lipid MAPS, and ChemSpider) had a matching ID. The other 54% are considered false negatives because they are likely true compounds that are not represented in the databases searched here. This is in agreement with previous GC- and LC/MS-based metabolomic analyses reporting ∼50%46,47 as the yield for searching databases. To eliminate the possibility of inadvertent elimination of true positives, we conducted the database searches with an ‘inflated’ mass error (i.e. 10 mDa or 11 ppm), which indeed may result in incorrect ID assignments. This is not a major concern here because these are subsequently eliminated in Checkpoint 2 described below. Searching other databases (e.g. MyCompoundID, MZmine, Massbank, METLIN, mz Cloud), may reduce the number of false negatives, but will not eliminate the fact that representation of chemical entities of biological interest is currently a major bottleneck in metabolomics and other unbiased analysis of small molecules in biological systems.29–31
Checkpoint 2 (Fig. 1) was critical to discard Database IDs with theoretical fragmentation patterns that were inconsistent with fragmentation patterns observed in the high-collision energy mass spectra (81% in Table S1, ESI†). Because we conducted searches with an ‘inflated mass tolerance’ (11 ppm), it is not surprising to find incorrect ID assignments in the database searches. Given the final mass error of the preliminary identifications (≤2 ppm, Table S2, ESI†) that passed Checkpoint 2, there is a low probability of false preliminary identifications. One concern though is that of incorrect prediction of fragmentation patterns. Better predictive algorithms for in silico fragmentation would reduce the number of false negatives caused by incorrect fragmentation predictions.48,49 In addition, comparison of predicted and observed isotope patterns could provide complementary scrutiny to in silico fragmentation comparisons currently done at Checkpoint 2.
Validating preliminary identifications via comparison of the UPLC/MSe of their respective commercially available standards in one of the biological systems is the gold standard in metabolomics studies. One such validation in our study was that of 1-octadecanoyl-glycero-3-phosphocholine (LMGP01050026), found in the comparative analysis of enriched autophagosomes and homogenate rat liver. UPLC retention times, XICs of the parent and fragment, and fragmentation patterns are remarkably similar between the sample (Fig. 3) and the data obtained with the standard (Fig. 4). Three other commercially available standards were also used to validate their respective preliminary identifications (Table S1 and Fig. S8–S10†). Although validation results could not be extended to other preliminary identifications, due to the lack of commercial standards, 100% of the preliminary identifications tested against standards were successful, giving credence to the use of Checkpoints 1 and 2 to increase the confidence in the preliminary identifications from comparative analysis of small molecules in biological samples.
The biological context is also critical to support the chemical identity of mass spectral features that were selected as preliminary identifications (Checkpoint 2). For instance, the preliminary identification of 1-octadecanoyl-glycero-3-phosphocholine (LMGP01050026) (Fig. 3) is also biologically validated because of the role that glycerophospholipids play in autophagy.38 Indeed, further biological insight could be gained through parallel studies of diseases associated with autophagy50–54 as well as the composition and origin of autophagosome membranes.55–57 Although not explored here, the biological context gives credence to preliminary identifications in the comparative studies of activated mast cells (see ESI†) highlighting the power of biological context to further support scrutiny through application of Checkpoints 1 and 2 (Fig. 1).
Footnote |
† Electronic supplementary information (ESI) available: Experimental details, supplementary table, supplementary data. See DOI: 10.1039/c6ay02500e |
This journal is © The Royal Society of Chemistry 2017 |