Jiangang
Zhang
,
Yu
Zhou
,
Juan
Lei
,
Xudong
Liu
,
Nan
Zhang
,
Lei
Wu
and
Yongsheng
Li
*
Department of Medical Oncology, Chongqing University Cancer Hospital, Chongqing 400030, China. E-mail: lys@cqu.edu.cn
First published on 30th November 2023
Dysfunctional lipid metabolism plays a crucial role in the development and progression of various diseases. Accurate measurement of lipidomes can help uncover the complex interactions between genes, proteins, and lipids in health and diseases. The prediction of retention time (RT) has become increasingly important in both targeted and untargeted metabolomics. However, the potential impact of RT prediction on targeted LC-MS based lipidomics is still not fully understood. Herein, we propose a simplified workflow for predicting RT in phospholipidomics. Our approach involves utilizing the fatty acyl chain length or carbon–carbon double bond (DB) number in combination with multiple reaction monitoring (MRM) validation. We found that our model's predictive capacity for RT was comparable to that of a publicly accessible program (QSRR Automator). Additionally, MRM validation helped in further mitigating the interference in signal recognition. Using this developed workflow, we conducted phospholipidomics of sorafenib resistant hepatocellular carcinoma (HCC) cell lines, namely MHCC97H and Hep3B. Our findings revealed an abundance of monounsaturated fatty acyl (MUFA) or polyunsaturated fatty acyl (PUFA) phospholipids in these cell lines after developing drug resistance. In both cell lines, a total of 29 lipids were found to be co-upregulated and 5 lipids were co-downregulated. Further validation was conducted on seven of the upregulated lipids using an independent dataset, which demonstrates the potential for translation of the established workflow or the lipid biomarkers.
Lipidomics is a powerful tool used for profiling lipidomes in a biological system. It can be divided into two categories: untargeted and targeted lipidomics.11,14 Untargeted lipidomics is utilized for the analysis of the comprehensive lipid profile, benefiting from its high resolution (HR) and wide ion coverage facilitated by mass analyzers such as time of flight (TOF) or Orbitrap. On the other hand, targeted lipidomics is suitable for determining a predefined list of targeted lipids using quadrupole (Q)-TOF and quadrupole ion trap (Q-TRAP) with high sensitivity, as indicated by relatively low limits of quantitation (LOQ) and detection (LOD).15 In both cases, liquid chromatography coupled with mass spectrometry (LC-MS) has become the most commonly used detection platform due to its high separation capacity and versatility in chromatography.16,17 However, interference peaks often occur when signals overlap in untargeted approaches or when one lipid generates a peak in another lipid's multiple reaction monitoring (MRM) transition within a small retention time (RT) range in targeted methods. This can result in inaccurate lipid annotation or quantification.18 While this issue can be partially addressed by using lipid library matching with an exact mass weight or tandem mass spectrometry (MS/MS) obtained mostly from untargeted methods, such as METLIN Gen2,19 MassBank database,20 the LIPID MAPS Web tools,21 LipidBlast,22etc., researchers still need to carefully consider the advantages and disadvantages of these methodologies before proceeding with their acquisition.
In addition to mass information, large-scale lipid identification requires supplementary information. RT from chromatography, included in lipid libraries, is crucial in supporting both untargeted and targeted strategies.23–25 The prediction of RT for lipids has gained popularity in untargeted lipidomics. Xu and Kohlbacher et al.23 introduced machine learning-based approaches to improve the assignment of lipid structures and automate annotation. Snel et al.26 developed an equivalent carbon number (ECN) model for predicting the RT of lipids with the same headgroup in reversed-phase separation. Other software packages such as LiPydomics,27 PredRet,24 quantitative structure-(chromatographic) retention relationship (QSRR) Automator,28 DynaStI,29etc., have also been developed for similar RT prediction purposes. Among these, QSRR models are the most commonly used theory for RT prediction.25,30 Artificial intelligence algorithms are used to enhance data integration and processing in iterative computing and data calibration. However, not all laboratories have a bioinformatics specialist proficient in applying these models and codes, leading to frequent occurrences of bugs and errors when data formats or default settings are inconsistent. Therefore, a modified workflow is often required to address such situations.
The interference of lipid signals may become more prominent and compromise the efficiency of detecting lipids in targeted lipidomics when a large number of MRM transitions are added to the methods, as suggested by previous research on targeted metabolomics.18 The use of the aforementioned model for RT prediction may help reduce false positive signal responses and improve lipid annotation to some extent. In our previous study, we utilized an equivalent carbon number and retention time prediction model to discover unusual odd chain fatty acyl lipids in targeted LC-MS lipidomics.31 However, such practices are still lacking in targeted lipidomics and may enhance the accuracy of lipid quantification, especially when researchers have limited diversity in mass spectrometry equipment. This prompts us to further investigate this field and explore the lipid phenotype in various disease contexts.
To gain further insight into the accurate measurement and analysis of lipids using targeted LC-MS lipidomics, here we developed a predictive model based on RT, validated it using MRM validation patterns, as well as used this model to investigate the changes in phospholipid dynamics. Furthermore, we verified these specific lipids using an independent clinical lipidomics dataset,32 to suggest the potential of these lipids as biomarkers for predicting the sensitivity of HCC treatment to sorafenib.
:
1 dichloromethane
:
methanol solution (1.2 mL per vial). The deuterated standards were provided at defined concentrations and were used directly (phospholipids are listed in Table S1†). The certificate of analysis for UltimateSPLASH™ ONE is available at https://avantilipids.com/.
:
1 DCM
:
MeOH containing 10 mM ammonium acetate).
:
ACN
:
H2O (1
:
1
:
1), while mobile phase B contained IPA, both with 5 mM ammonium acetate. The elution gradient of a total of 17 minutes was applied as follows: 0–1 min: 20% B, 1–2.5 min: 40% B, 2.5–4 min: 60% B, 4–14 min: 90% B, 14–15 min: 90% B, 15–15.1 min: 20% B, and 15.1–17 min: 20% B. The temperature and flow rate of the chromatography column were set at 40 °C and 0.3 mL min−1, respectively. An injection volume of 5 μL was used, and the negative ESI mode was employed throughout the analysis. Lipid quantitation was performed using scheduled MRM, and the parameters for ion spray voltage (ISV, −4500 V), curtain gas (CUR, 30 psi), nebulizer gas (GS1, 55 psi), turbo-gas (GS2, 55 psi), turbo ion spray source temperature (350 °C), and collision gas (CAD, medium) were optimized.
Additional parameters of the mass spectrometer, such as de-clustering potential (DP), entrance potential (EP), collision exit potential (CE), and collision exit potential (CXP), were optimized for each subclass of lipid species. Nitrogen was used as the collision gas with an auto-generator. Peak assignment and integration were performed using SCEX OS. The signals were automatically recognized and integrated based on the RT predictive value. Manual inspection was also conducted to ensure accurate peak integration by the software. All lipids were then normalized to the peak area of the internal standard within the same subclass (Table S4†).
Additional variable influence of projection (VIP) was calculated for each individual OPLS-DA model. Only lipids with P-values <0.05, VIP values >1, and fold changes ≥20% for relative abundance were considered statistically important. Data processing was also performed using MetaboAnalyst for cross-validation and reproducibility verification.
Other statistical analyses were conducted using the two-tailed, unpaired Student's t-test or one-way ANOVA with the Dunnett multiple comparison test, as specified in GraphPad Prism 9.0.0. P values (***P < 0.001, **P < 0.01, and *P < 0.05) were automatically labeled in the figures. The graphical abstract was created using BioRender (https://www.biorender.com/). Graphs were plotted using either OriginPro 2021 (64-bit) 9.8.0.200 or GraphPad Prism 9.0.0, and final figures were prepared using Adobe Illustrator CS4.
Using a limited sum of deuterated standards from commercially available products (Table S1†), referred to as the training lipid set, we initially generated the RT list of 40 phospholipids (Table S1†). The mean RTs of five replicates of LC-MS runs were calculated. The standard error of the RT was within an acceptable range and could be utilized for further computations. For both the internal standards and the phospholipid library (including PC, PE, phosphatidylglycerol (PG), phosphatidylinositol (PI), phosphatidylserine (PS), and phosphatidic acid (PA), Table S1†), we calculated the total number of carbon atoms and double bonds. The highest number of carbon atoms obtained was 43, which was consistent across the six subclasses of lipids. This value was then used to determine the relative carbon atom number using the equation y = C/Cmax, where C represents the number of carbon atoms of a specific lipid and Cmax is kept as 43. By defining Cmax as a constant, we simplified the comparison of interclass RTs of lipids and ensured the rational distribution of RTs among the six subclasses of phospholipids. Consequently, the relative carbon atom numbers of the lipids were naturally deduced. Similarly, the relative RT was calculated using the equation y = T/Tmax, where T represents the retention time of a specified lipid and Tmax was the elution time of the established chromatographic system (17 minutes in this case). This calculation allowed for the easy preparation of RT ratios for deuterated standards.
The second degree polynomial regression (y = B0 + B1 × x + B2 × x2) was then used to fit the curve between fatty acyl chain length or DB number and RT. For example, in the case of PC, the compounds 17:0-14:1 PC-d5 (PC(31:1)-d5), 17:0-16:1 PC-d5 (PC(33:1)-d5), and 17:0-18:1 PC-d5 (PC(35:1)-d5), which had double bonds at carbon atom numbers 0.72, 0.77, and 0.81 respectively, exhibited relative RT values of 0.32, 0.34, and 0.37, respectively. Therefore, the regressive curve of PC(X:1) (X refers to the number of carbon atoms) could be fitted as y = 0.3047 − 0.4733x + 0.6801x2. Due to the limited number of standards available for model training, a simple rule regarding the variation of the RT values could be derived. This rule suggested a nearly constant RT ratio between PC(33:1)-d5/PC(31:1)-d5 (approximately 1.077151335) and PC(35:1)-d5/PC(33:1)-d5 (approximately 1.083677686), which was referred to as the carbon chain length constant (k). Based on this constant, the RT values of PC(37:1) and PC(39:1) were estimated to be 0.40 and 0.43, respectively.
The following experimental RT values confirmed our initial hypothesis. The fluctuation of RT is influenced by both the length of the carbon chain and the degree of unsaturation. In general, when a lipid has more double bonds or fewer carbon atoms, RT tends to decrease, while it tends to increase when there are fewer double bonds or more carbon atoms.34,35 We then investigated the relationship between lipids with the same chain length but varying double bonds, specifically 17:0-20:3 PC-d5 (PC(37:3)-d5), 17:0-22:4 PC-d5 (PC(39:4)-d5), and PC(37:1), PC(39:1). We found that the square root of the RT ratio between PC(37:3)-d5 and PC(37:1) was 0.941656387 (referred to as λ, the double bond constant). Surprisingly, the cube root of RTPC(39:4)-d5/RTPC(39:1) was 0.943296587, matching the previously obtained square root value. Based on the RT of PC(X:1) and these established results, we predicted the RT of PC(37:Y) (where Y refers to the number of double bonds, Y = 0, 1, 2, 3, 4, etc.) and PC(39:Y) in accordance with PC(X:1) (where X = 31, 33, 35, 37, 39, etc.). Subsequently, an array of relative RTs of PC were extrapolated and fitted with regressive curves simultaneously, as shown in Fig. 1A and Table S2.† Similarly, the RTs and fit curves of PE (Fig. 1B), PG (Fig. 1C), PI (Fig. 1D), and PS (Fig. 1E) were processed proficiently (Table S2†). However, due to a lack of deuterated PA standards in the kit, authentic standards of PA(16:0_18:1) and PA(18:0_18:1) were alternatively employed to estimate the k value of PA. Consistently, RT at 5.18 and 5.57 min yielded a k value of 1.075289575, illustrating the comparable value obtained from the other subclasses of phospholipids. Directly, the λ value from PC was used for the RT estimation of PA (Fig. 1F).
In addition, lysophospholipids, derived from the aforementioned lipid classes after hydrolysis to remove an acyl group, exhibited different properties compared to their original phospholipids. Therefore, the RT prediction of LPC (Fig. S1A†), LPE (Fig. S1B†), LPG (Fig. S1C†), LPI (Fig. S1D†), and LPS (Fig. S1E†) was also inferred using their respective deuterated standards containing 15:0, 17:0, and 19:0 fatty acyl moieties (Table S3†). In order to estimate the RT, the λ constant from the precursor lipids was approximately used. As a result, a library of different subclasses of phospholipids with predictive RTs was successfully constructed (Tables S4 and 5†).
Recent studies have utilized machine learning techniques to predict RT. However, applying a published machine learning model to diverse research contexts and lab facilities may present challenges. Here we attempted to predict RT using various codes and packages, including QSRR Automator,28 a software package designed to automate RT prediction models. In this process, lipid IDs were first converted to the simplified molecular input line entry system (SMILE) format using MetaboAnalyst (https://www.metaboanalyst.ca/MetaboAnalyst/upload/ConvertView.xhtml). Subsequently, the RT values were added to the templates and the data were inputted, resulting in the smooth generation of predictive results for PC (Fig. S2A†) and PE (Fig. S2B†). The final models for PC and PE achieved R2 scores of 0.99 and 0.98, respectively. The mean absolute errors for PC and PE were 0.003 and 0.054, respectively. These results demonstrate the consistency of RT prediction between the QSRR Automator and our developed model.
LC-MS analysis was conducted using sample extracts to identify peaks, as shown in Fig. 3. The peaks for PC(18:0_18:3), PE(16:0_20:1), PG(18:0_18:1), PI(18:0_20:5), PS(16:0_18:1), and PA(18:0_18:2) exhibited significant signal interference, complicating the lipid annotation process. However, by comparing peaks and considering predictive RT, it was possible to accurately identify the correct lipids even when one channel displayed a single peak and the other was affected by unknown chemicals.
Moreover, there are two situations that can occur when annotating lipids with equivalent carbon atoms and degree of unsaturation. For instance, PC(38:5) can be split into PC(18:0_20:5) and PC(18:2_20:3) with retention times (RT) of 5.7 min and 5.4 min, respectively (Fig. S4A†). In this scenario, the two lipids can be automatically separated at baseline. However, in another case, lipids with the same molecular formula may not be resolved in chromatography, resulting in different fatty acyl moieties sharing nearly identical RTs. Examples of such lipids were PC(18:1_18:3), PC(16:0_20:4), and PC(18:2_18:2) of PC(36:4) (Fig. S4B†). Unfortunately, this technical limitation cannot be easily overcome during targeted lipidomics.
A recent study utilized the combination of hydrophilic interaction liquid chromatography (HILIC) and trapped ion mobility spectrometry (TIMS) to address the challenge of determining the double bond locations in phospholipids and sn-positions in phosphatidylcholine. This innovative approach offers a solution for differentiating lipid isomers within the phospholipidome.36 Despite the presence of remaining challenges facing the field, the validation of MRMs has proven to be an effective tool for lipid assignment. This was evident in the identification of various lipids, such as PC(16:0_18:1), PE(16:0_18:1), PG(16:0_18:1), PI(18:0_18:1), PS(18:0_18:1), and PA(16:0_18:1) (Fig. S5†), through robust identification in both MRM channels.
The distribution of phospholipids was analyzed in cell extracts, and a total of 307 lipids were detected. Among these, PE was found to be the most abundant, with a total of 110 lipids, nearly twice the amount compared to PC which had a total of 59 lipids. The abundance of PG, PI, PS, and PA varied, with the number of lipids ranging from 23 to 45. When comparing the total phospholipid abundance between the control group (C) and the DR in the MHCC97H cell line, no significant difference was observed (Fig. 5B). However, a significant deficiency in phospholipid abundance was observed in the DR group of the Hep3B cell line. Specifically, when analyzing the individual subtypes of phospholipids, a decrease in PC and PI abundance and an increase in PG, PI, and PA abundance were observed in the DR group of the MHCC97H cell line (Fig. 5C). In contrast, there was no significant divergence in the abundance of these phospholipid subtypes in the Hep3B cell line (Fig. 5D). Additionally, the fatty acyl residues in the phospholipids were divided into saturated (SFA), monounsaturated (MUFA), and polyunsaturated (PUFA) moieties. The degree of unsaturation in the phospholipids of both DR groups of cell lines significantly increased, as exemplified by an increase in MUFA in the MHCC97H cell line (Fig. 5E) and an increase in PUFA in the Hep3B cell line (Fig. 5F).
To further filter the co-regulated lipids or biomarkers, OPLS-DA models were used to reanalyze the data from both cell lines. The VIP values greater than 1 were considered statistically relevant. This analysis identified 183 lipids with VIP > 1 in MHCC97H cells and 181 lipids with VIP > 1 in Hep3B cells. There were 109 lipids that were common to both cell lines (Fig. 6A and Table S7†). Among these 109 lipids, those with a correlation coefficient (p(cor)) greater than 0.6 or less than −0.5 were considered to have a positive or negative association with the model, respectively. As a result, 29 lipids (Fig. 6B and Table S8†) were identified as co-upregulated, and five lipids (Fig. 6C and Table S9†) were identified as co-downregulated in both MHCC97H and Hep3B cells.
![]() | ||
| Fig. 6 Co-regulated lipid screening. Phospholipids with VIP > 1 from OPLS-DA analysis of both cell lines and their overlapped lipids as shown in the Venn diagram (A). Phospholipids with p(corr) > 0.6 or <−0.5 from OPLS-DA analysis of MHCC97H (B) and Hep3B (C) and their overlapped lipids as shown in the Venn diagram. The SUS plot of OPLS-DA analysis from both cell lines (D). The probability curve of lipids linked with their p(corr) coefficient, M1 and M2 refers to OPLS-DA analysis of MHCC97H and Hep3B cells, respectively (E). The shared upregulated (F and Table S8†) and downregulated (G) phospholipids were sorted out from both cell lines after sorafenib resistance. The accumulated abundance of SFA, MUFA, and PUFA containing co-upregulated phospholipids from MHCC97H (H) and Hep3B cells (I). | ||
The shared and unique structures plot (SUS-plot) was used to correlate the two OPLS-DA models in order to identify appropriate lipids for discovering biomarkers from sorafenib resistant cells. The plot confirmed the previous findings, showing that 29 co-upregulated lipids were located in the first quadrant of the coordinate, while five co-downregulated lipids were located in the third quadrant (Fig. 6D). This suggests that these lipids share common characteristics, with either increased or decreased abundance. In contrast, the lipids scattered in the second and fourth quadrants exhibit unique fluctuations (Fig. 6D). The p(cor) plot of OPLS-DA analysis from both cell lines was used to confirm the co-regulated lipids at specific angles of the curve (Fig. 6E). The overlapping lipid set from enriched and decreased lipids in both sorafenib resistant cell lines was then plotted, representing specialized lipid biomarkers regardless of the cell line (Fig. 6F and G). Additionally, the fatty acyl moieties in the 29 lipids were re-analyzed, showing enhanced properties for either SFA, MUFA, or PUFA in the two sorafenib resistant HCC cell lines (Fig. 6H and I).
To investigate the phospholipidomics of sorafenib resistant HCC cell lines, we conducted experiments on two cell lines, MHCC97H and Hep3B. The analysis revealed the accumulation of unsaturated lipids in both cell lines after acquiring resistance to sorafenib. Previous studies have shown that stearoyl-CoA desaturase-1 (SCD1) plays a crucial and rate-limiting role in the synthesis of MUFA. In line with this, we found that the expression of SCD1 was significantly upregulated in the sorafenib resistant PLC/PRF/5 HCC cell line and patient-derived tumor xenograft (PDTX) model.37 This suggests that SCD1 may also have a similar role in our sorafenib resistant cell lines, as we detected an increase in the abundance of MUFA or PUFA. Moreover, the inhibition of SCD1 was found to mitigate the sorafenib sensitivity in HCC,37 indicating a potential combinatorial strategy of targeting SCD1 along with sorafenib to combat HCC. Our in vitro experiments provided further support for these findings.
In our study, we discovered that 29 co-upregulated and 5 co-downregulated lipids were found in both models of the OPLS-DA or SUS plot. These lipids were evaluated from an external lipidomic dataset obtained from the plasma of sorafenib pretreatment HCC patients,32 with the aim of predicting sorafenib sensitivity. Out of these, seven lipids were found to have reduced abundance in drug responders, which aligns with our observation that these lipids were present in higher concentrations in sorafenib resistant cells. This finding supports the potential use of these lipids as biomarkers for predicting sorafenib sensitivity. However, the underlying mechanisms behind the increased abundance of these lipids and how they can be manipulated to enhance the killing effect of sorafenib are still largely unknown. Our results provide valuable insights into the lipid phenotype and may contribute to developing strategies to overcome sorafenib resistance in HCC.
However, our proposed RT prediction modeling may also have some drawbacks, such as overfitting and limited coverage of metabolites, like metabolomics. Regarding overfitting, we quantified the difference in RT between in silico and experimental parameters, and found that the largest gap in RT was less than 0.5 min. This value was considered acceptable, especially when MRM validation was taken into account. To improve the accuracy of RT prediction, we also explored the use of iterative data to train the model, since the availability of deuterated or authentic standards was limited. However, our efforts were hindered by the inconsistent criteria used by different labs. Additionally, the inclusion of a comprehensive library of metabolites and corresponding standards would undoubtedly enhance the training and prediction capabilities of RT. Furthermore, the use of machine learning algorithms, based on theoretical calculations using chemical structure formulas like SMILE, has significantly accelerated the fidelity of RT prediction and the discovery of novel compounds. Our model simplifies the predictive process and demonstrates good performance in RT prediction, making it a viable strategy for customized phospholipidomics.
The understanding of lipid fragmentation rules is still incomplete, and the identification of lipids using MRM is complicated by the interference of multiple signals at different sample contexts in lipidomics. This interference affects lipid assignment in various ways. In the case of phospholipids, there are two fatty acyl residues located either at the sn-1 or sn-2 position of the glycerol backbone, resulting in the need for two MRM transitions for target identification. This leads to the generation of nearly a thousand ion pairs in one setting, which increases the cycle time, decreases data acquisition, and results in poorer peak shapes. To address this issue, scheduled MRMs were employed instead of using a constant dwell time. By incorporating one additional MRM transition in LC-MS analysis, the peak annotation was corrected, and the confidence in lipid identification was improved. It is important to note that the multiplex information of lipids, which depends on their own fragmentation pattern, cannot be easily transferred to other classes of lipids.
Overall, this study demonstrates the application of the fatty acyl chain length or DB number as a model for developing a lipidomics library. Using MRM validation in conjunction with this workflow, it significantly strengthened confidence in targeted lipidomics and provided comprehensive coverage of the phospholipidome. Furthermore, the study generated RTs for 526 lipids by selecting a limited number of deuterated or authentic standards. The determination of RTs greatly accelerated scheduled MRM for high throughput analysis, with a difference of less than 0.5 minutes between measured and predicted RT values. As a result, the utility of this workflow was demonstrated in sorafenib resistant HCC cell lines, we found an increase in MUFA or PUFA in sorafenib-resistant cells, with seven co-upregulated lipids being identified and validated using an independent data set. This workflow will be potentially applied for RT prediction and lipid biomarker identification in diseases.
Footnote |
| † Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d3an01735d |
| This journal is © The Royal Society of Chemistry 2024 |