Jan
Muntel‡
,
Tejas
Gandhi‡
,
Lynn
Verbeke
,
Oliver M.
Bernhardt
,
Tobias
Treiber
,
Roland
Bruderer
and
Lukas
Reiter
*
Biognosys AG, Wagistrasse 21, 8952 Schlieren, Switzerland. E-mail: lukas.reiter@biognosys.com
First published on 29th August 2019
Comprehensive proteome quantification is crucial for a better understanding of underlying mechanisms of diseases. Liquid chromatography mass spectrometry (LC-MS) has become the method of choice for comprehensive proteome quantification due to its power and versatility. Even though great advances have been made in recent years, full proteome coverage for complex samples remains challenging due to the high dynamic range of protein expression. Additionally, when studying disease regulatory proteins, biomarkers or potential drug targets are often low abundant, such as for instance kinases and transcription factors. Here, we show that with improvements in chromatography and data analysis the single shot proteome coverage can go beyond 10000 proteins in human tissue. In a testis cancer study, we quantified 11
200 proteins using data independent acquisition (DIA). This depth was achieved with a false discovery rate of 1% which was experimentally validated using a two species test. We introduce the concept of hybrid libraries which combines the strength of direct searching of DIA data as well as the use of large project-specific or published DDA data sets. Remarkably deep proteome coverage is possible using hybrid libraries without the additional burden of creating a project-specific library. Within the testis cancer set, we found a large proportion of proteins in an altered expression (in total: 3351; 1453 increased in cancer). Many of these proteins could be linked to the hallmarks of cancer. For example, the complement system was downregulated which helps to evade the immune response and chromosomal replication was upregulated indicating a dysregulated cell cycle.
Fractionation-based approaches typically require a large amount of sample and MS time thereby limiting the throughput and complicating quantification. Therefore, methods for single-shot proteome analysis were developed further.7 A deep analysis of the yeast proteome in 2011 took 8 h8 and in 2014 it was possible to achieve an even greater depth in only about 1 h.9 This improvement in depth and analysis time were primarily driven by faster and more sensitive mass spectrometers.7 Nowadays, a comprehensive proteome coverage for less complex organisms, like yeast, can be routinely achieved in a single-shot analysis. However, it remains challenging for human tissue samples. Within the tissue, the proteome complexity is higher with an estimated expression of more than 10000 protein-coding genes and a higher dynamic range of protein expression.10,11 A milestone for mammalian tissues was achieved in 2018 in which >10
000 proteins in a single-shot were detected using the BoxCar acquisition method and matching identifications to a previously acquired fractionated dataset.12 In the same year, a comprehensive single-shot proteome analysis was achieved using high-field asymmetric waveform ion mobility spectrometry (FAIMS) resulting in identification of >8000 proteins.13 A similar depth was also achieved using online parallel accumulation-serial fragmentation (PASEF) on the timsTOF Pro mass spectrometer.14
Data-independent acquisition (DIA) methods showed great potential for comprehensive single-shot proteome analysis. First introduced in the mid-2000s,15,16 various studies up to now showed that these methods enabled an accurate, precise and comprehensive proteome quantification.17–23 By fragmenting the interesting mass range in sequential, wide isolation windows (SWATH-type DIA approach24,25), it was possible to identify more peptides in short gradients than could be theoretically targeted in a sequential manner using data-dependent acquisition (DDA).26
Commonly, DIA data are analysed using a project-specific library. Such a library contains fragmentation information from previously acquired DDA data.24 For a project-specific library the DDA data are acquired using the same LC-MS setup and the same samples as used for the analysis of the samples by DIA. This generates an overhead to the DIA experiment. To avoid the overhead, publicly available resource libraries were utilized in the DIA data analysis.26,27 In an alternative approach, fasta-database search tools for DIA data were developed, e.g. DIA-Umpire,28 Pecan29 or DirectDIA in Spectronaut. Compared to library-based analysis of the DIA data, a database search resulted in less identified proteins, but an improved quantitative performance.30
A marriage of both of these approaches, i.e. combining data from a database search of DIA data with results of a search of DDA, holds the potential to the improve DIA data analysis in terms of quantitative precision and number of quantified proteins.31,32 Such a workflow has to address three key challenges: (1) degradation of indexed retention time (iRT) precision33 in the library as a result of potentially heterogenous chromatography, (2) homogeneous protein inference and false discovery rate (FDR) control when merging data from several sources to build the library, and (3) robust protein FDR control during DIA analysis. A solution to the RT precision proposed by MacCoss group32 utilized data from a narrow window DIA method to calibrate a protein database or library with chromatographic and MS-specific parameters. Whereas this approach greatly improved the number of quantified peptides and proteins in classical as well as resource libraries, the FDR estimate remained challenging and required additional tools. Furthermore, it required a new calibration of the library after each variation of the chromatography, e.g. column change.
Here, we propose a combined DIA data analysis strategy named hybrid library workflow which solves the three above mentioned challenges. The advantage of this approach is that it does not require any special calibration runs and the FDR control is robustly maintained throughout the entire pipeline at the library and DIA analysis level.34,35
The recent improvements in single-shot proteome analysis were mainly driven by development of better liquid chromatography, mass spectrometers, acquisition methods and data analysis strategies. On the liquid chromatography side, the last major progress was the introduction of commercial ultra-high pressure pumps which enabled the routine use of reversed phase sub-2 μm particles and long columns (>20 cm).7,36,37 Only recently, efforts have been made to improve the packing of the column by ultra-high pressure packing of columns38 and novel micro-pillar array columns have been introduced for proteomics applications.39
Today, sub-2 μm C18 solid phases are available with different chemistries. Interestingly, little work has been published to compare these particles in the context of single-shot proteomics. Hence, we decided to compare three different C18 solid phases that are commonly used in proteomics. We optimized the gradient shape and length to maximize the proteome coverage for single-shot experiments. In addition to the optimization of the chromatography, we introduce the afore-mentioned hybrid library approach for the analysis of the DIA data. Finally, we applied this optimized workflow in a small testis cancer study.
Testis tissue is one of the most complex human tissues.40–42 Due to the major role of the testis in the human reproduction system, it was studied mainly under the aspect of male fertility,43 and very little efforts have been made to study cancer in testis. In contrast to other cancers, the frequency of testis cancer peaks at an age of about 35 years similar to other germ cell cancers.44
The limited number of proteomic testis cancer studies and the high complexity of the tissue were the reasons, why we chose this tissue to demonstrate the large potential of our workflow for a comprehensive single-shot proteome analysis. Here we show the quantification of >10000 proteins at 1% FDR on precursor and protein level in testis tissue and how these data can serve for a better understanding of the molecular mechanisms of the cancer development.
With all setups, we quantified between 77519 (ReproSil Pur, 300 nl min−1) and 82
060 precursors (CSH, 250 nl min−1) (Fig. 1A and Table S2A, ESI†). Across all flow rates, we achieved the highest numbers of quantified precursors, peptides and proteins with the CSH solid phase; on average 5% more precursors were quantified in comparison to the ReproSil solid phase and 3% more peptides in comparison to the BEH solid phase. Similar trends were observed on peptide (+7% compared to ReproSil, +5% compared to BEH) and protein level (+8% compared to ReproSil and +5% compared to BEH). Interestingly, 250 nl min−1 was the optimal flow rate for all tested solid phases. For the CSH solid phase, we quantified 3% more precursors at 250 nl min−1 compared to 200 nl min−1 and 300 nl min−1 and observed similar trends on peptide and protein level. Additionally, these findings from the DirectDIA analysis were confirmed with a project-specific HeLa library (Fig. S1 and Table S2B, ESI†). It was possible to quantify in average 55% more precursors and peptides as well as 27% more proteins as compared to DirectDIA. Importantly, the qualitative differences between the solid phases and flow rates were consistent with the DirectDIA analysis. This demonstrated that the DirectDIA analysis is an effective alternative for experiments in which relative differences between workflows are under investigation and a high depth of the analysis is not required.
Overall the differences in quantified precursors, peptides and proteins were rather small between the three tested solid phases and flow rates. Nevertheless, the results from the CSH phase showed a consistent better performance and 250 nl min−1 resulted in the highest number of quantified precursors, peptides and proteins. Therefore, the CSH phase and a flow rate of 250 nl min−1 was further used in this study.
Next, we incrementally increased the length of the gradient. We started with a 2 h gradient and ramped in 2 h steps up to 8 h. Again, the number of quantified precursors and proteins was compared using a HeLa sample and the DirectDIA analysis strategy. Each extension of the gradient increased the number of quantified precursors, peptides and proteins (Fig. 1B and Table S3A, ESI†). The largest improvement was observed by extending the gradient from 2 h to 4 h. The average number of quantified precursors in a triplicate measurement increased significantly by 38% from 86935 to 119
992 (p = 3 × 10−7, based on two-sample t-test), on peptide level by 30% (p = 4 × 10−6) and on protein level by 17% to 6364 proteins (p = 1 × 10−4). Further, extension of the gradient to 6 and 8 h resulted in 7% more quantified precursors for each step, 9 to 6% more quantified peptides and 6 to 4% more quantified proteins. The highest number of quantified proteins was achieved by the 8 h gradient. In average 6970 proteins were quantified in a triplicate run. We noticed that for the 6 and 8 h gradients, the identification reproducibility decreased. The CV of the precursor identification was for the 2 h gradient 0.9% and increased to 5.4% for the 8 h gradient (compare error bars in Fig. 1B). This indicated that the shallow gradients with a very slow increase in percentage B became less reproducible. Because of this observation, we decided to analyse the number of precursors, peptides and proteins that were quantified with a CV below 20% based on an injection triplicate. This analysis allowed us to assess the quantitative precision of the measurements as well as the number of quantified precursors, peptides and proteins. As before, we observed the largest improvement by extending the gradient from 2 h to 4 h (+29% precursors, +26% peptides, +16% proteins). By further extending the gradient, we found a maximum of precursors with a quantitative CV below 20% for the 6 h gradient (95
229 precursors). The same was true on peptide (61
755) and protein level (4787). To exclude that the DirectDIA analysis might have introduced the lower quantitative precision for the 8 h gradient, the sample set was re-analysed with the HeLa project-specific library (Fig. S1B and S3B, ESI†). Again, this analysis revealed a higher sensitivity as shown by 54% more quantified precursors, 52% more peptides and 23% more proteins compared to the DirectDIA analysis. Importantly, the relative differences in quantified precursors, peptides and proteins between the gradient lengths as well as the respective numbers with a quantitative CV below 20% were consistent with the DirectDIA analysis. Due of the higher quantitative precision of the data generated with the 6 h gradient, we decided to use this gradient henceforth.
Overall with the chromatographic improvements using the CSH solid phase and extension of the gradient to 6 h, we were able increase the number of quantified precursors from 77184 to 128
718 (+67%) and the number of quantified proteins from 4923 to 6712 (+36%).
There are three main challenges involved with this type of a workflow: (1) degradation of retention time precision from combining data with heterogenous chromatography, (2) loss of FDR control at the library level, and (3) robust protein FDR control during the DIA analysis. For DIA data analysis, we generally convert retention times into indexed retention times (iRTs),33 which are dimensionless and can be utilized for highly accurate retention time calibration.45
To solve the first challenge of degradation of retention time specificity, we approached it by allowing libraries with different iRT spaces instead of combining them into one. This means that each peptide in a library can have multiple associated empirical iRTs, if it was identified from multiple sources. This additional information in the library can be exploited during the DIA analysis in the form of source-specific RT to iRT calibration. This means that, if a peptide was identified both from DirectDIA and a DDA dataset, the retention time information from the best RT to iRT calibration will be used. The benefit of this approach is that the library moulds itself to best fit the DIA data without needing to recalibrate the library with each change in chromatography. The second challenge for this workflow is to maintain the FDR control at the library level. We solved this with our database search engine Pulsar and the introduction of search archives. Search archives are a collection of all PSMs identified without any FDR filtering (i.e. it contains the computationally expensive database search results). This enabled to combine previously searched DIA or DDA data with the target DIA runs in a manner that is both computationally efficient and FDR controlled. The average time to generate a library was reduced by >90% using search archives (Fig. S2, ESI†). Finally, we adapted the FDR calculation to account for the heterogeneity in the iRT space being targeted by building one separate precursor FDR model for each iRT source and normalizing the scores afterwards. We validated this approach by performing an empirical two species FDR test (Fig. 2A).
To test the hybrid library approach, we combined the project-specific library and the DirectDIA data (hybrid library size: 415002 precursors, 231
791 peptides and 10
624 proteins, Fig. S3, ESI†). This hybrid library was applied to the injection triplicate of the 6 h HeLa runs. The performance was evaluated using the total number of quantified precursors, peptides and proteins as well as the respective quantifications with a CV below 20% (Fig. 2B and Table S4, ESI†).
As already shown in the previous analysis, the library-based analysis of the samples improved the number of quantified precursors (+78%, p = 2 × 10−5), peptides (+71%, p = 7 × 10−6) and proteins (+22%, p = 2 × 10−8) significantly compared to the DirectDIA analysis. Application of the hybrid library led only to a minor increase in quantified precursors (+3%) and proteins (+4%). The number of quantified peptides even slightly decreased in this analysis (−2%). The same was true for the quantifications with a CV below 20%. The overlap of quantified precursors, peptides and proteins was high between the three different analysis strategies (86% of the precursors and 88% of the peptides identified by DirectDIA were also identified by the other strategies, Fig. S3B, ESI†). The overlap on protein level was especially high; only <1% to 3% of the proteins were exclusively identified by only one approach.
It is noteworthy that we were able to quantify on average 8465 proteins and 7712 proteins with a quantitative CV below 20% within 6 h using the hybrid approach. This result represents a substantial improvement, especially in number of quantified proteins with a CV below 20%, to a recent comprehensive single-shot proteome study. In this publication more than 8000 proteins were identified in 5 h from which 6444 proteins were quantified with a CV below 20% in a human cell line sample.13
First, we analysed an injection triplicate of one of the testis cancer samples with the above-mentioned libraries. Using the project-specific hybrid library, we were able to quantify on average 10146 proteins per run (263
791 precursors, 163
825 peptides), which represented an improvement of 6% on protein level, 1% on peptide and 10% on precursor level compared to the project-specific library (Table S5A and B, ESI†). Out of the 10
146 proteins, 8783 were quantified with a CV below 20% (Fig. S4B, ESI†). Notably, the largest increase in quantified precursors (+27%), peptides (+12%) and proteins (+10%), including the numbers with a CV below 20% (precursors: +16%, peptides: +14%, proteins: +10%), for the hybrid library was found by application of the hybrid library approach to the resource library (Fig. S4B, Tables S5C and D, ESI†). We noticed the largest overlap in quantified precursor and peptide between all four different analysis approaches (Fig. S4C, ESI†). The second largest overlap was found between project and project hybrid library, followed by the overlap of these two libraries to the resource hybrid library. This result was expected as these three libraries contained most of the sample-specific precursors and peptides. As for the HeLa dataset, the overlap on protein level was much higher compared to precursor and peptide level (76% of all quantified proteins were quantified by four libraries).
Based on these results and the results from the analysis of the HeLa using the hybrid library approach, we concluded that the benefit of the hybrid library depends on the size and quality of the initial library. For the HeLa dataset, we used a single HeLa digest for library generation and DIA analysis and observed minor differences in quantified precursors (+3%), peptides (−2%) and proteins (+4%) in the DIA experiment. The initial project-specific library provided already a very high coverage of the detectable precursors, peptides and proteins. A larger improvement was noticed for the testis sample (precursors: +10%, peptides: +1%, proteins: +6%). For the project-specific library, we created two condition pools each comprising three biologically different samples. In this commonly used strategy for DIA experiments, sample-specific proteins could be potentially missed, because of dilution of these proteins in the pooled sample. This effect has been described previously for a protein spike-in experiment in complex background30 and will likely get more prominent the more samples are pooled to generate the project-specific library. The largest improvement in quantified precursors (+27%), peptides (+12%) and proteins (+10%) using the hybrid library approach was found for the resource library which was based on DDA data from unrelated samples and on a different LC-MS setup. This finding showed that the hybrid library approach improved the usability of resource libraries by addition of sample-specific proteins, that were not part of the resource library.
During DIA data analysis, the FDR was calculated based on a target-decoy model. To experimentally cross validate the FDR estimates, we performed a two-species test using Arabidopsis thaliana as negative set and the testis cancer set (human) as positive set (Fig. 2C). For validation of the project hybrid library, we made two different libraries: (1) DDA library by searching human testis DDA runs together with A. thaliana DDA runs, and (2) DIA library by searching human testis DIA runs together with A. thaliana DIA runs. All the A. thaliana runs were acquired using the same setup as the testis data. In this manner, we had two different sources as before (DDA and DIA) but with a built-in negative set. Afterwards we used these libraries for the analysis of the DIA data from the testis cancer set. The empirical FDR was now calculated as the ratio of the identified human to A. thaliana proteins, which came out to be 0.8% (Table S7, ESI†). Additionally, we used the DDA data-based library only to validate the FDR of the project library, which was 0.9% (Fig. 2C). We concluded that the hybrid library approach did not inflate the protein FDR.
This was only the second time that more than 10000 quantified proteins in a single-shot were reported. In 2018 Meier et al. reported more than 10
000 proteins detected in a single run by using the BoxCar acquisition method and an alignment strategy.12 Whereas an FDR control is difficult using an alignment strategy and generally not performed at all, FDR control on precursor and protein level is commonly applied in DIA experiments46 and was empirically validated for this study using an A. thaliana library as negative set.
In total, we quantified 11197 proteins (including 715 one-peptide identifications) and 10
554 proteins in average per sample (Fig. 3A and Table S6A, ESI†). The dataset covered 6 order of magnitude dynamic range of protein abundance (Fig. 3B). A median biological CV on protein level of 23% for the NAT and of 26% for the cancer cohorts indicated a good quantitative precision and we noticed only a minor dependency of the quantitative precision on the protein abundance. For the lowest abundant 2000 proteins, we determined a CV (including biological and technical variance) of 30% for the NAT cohort and 32% for the cancer cohort and for the 2000 highest abundant proteins 20% for the NAT and 23% for the cancer samples (Fig. 3B). Differential abundance between the NAT and cancer cohort was determined using a t-test including multiple testing correction with the method described by Storey48 (as implemented in Spectronaut; unfiltered candidate list: Table S8, ESI†). Proteins were considered differentially abundant with an absolute log
2 fold-change larger than 1 and a Q value below 0.01. We found in total 3178 proteins in an altered amount between the cancer and NAT samples. Of these proteins, 1453 were found in an increased abundance in the cancer cohort and 1725 proteins in a lower abundance (Fig. 3C). This finding demonstrated a large impact on the proteome by the cancer, which was also reported already for cancer in other tissues.49,50
As comparison we also analysed the testis cancer set with the 2 h and 4 h DIA method. Whereas with a 2 h gradient the number of identified proteins was still below the 10000 mark (on average 7610 and in total: 8488, Table S6B, ESI†), it was possible to identify in total 10
404 proteins, but on average just below 10
000 (9658, Fig. S5A and Table S6C, ESI†). Interestingly the number of differentially abundant proteins within the 4 h dataset (3351) was comparable to the 6 h dataset (3178, Fig. S5B, ESI†). Additionally, we also analysed the overlap of the candidate lists derived from the analysis of the data of the three different gradient lengths (Fig. S5C, ESI†). Overall, we found the largest overlap in candidates between the 4 h and 6 h dataset (∼80% of the candidates from both gradients). The overall overlap was large (1657 proteins) with only a low percentage of proteins quantified statistically significant in only one dataset (between 12 and 14%).
Unsupervised clustering showed a clear differentiation between the NAT and cancer samples. This finding indicated that the quantitative differences between the samples were driven by the underlying biological changes (Fig. 4A). For biological interpretation, the quantitative data were loaded into Ingenuity Pathway Analysis (IPA, Qiagen). The liver X receptor/retinoid X receptor (LXR/RXR) activation pathway (Fisher's exact test, p = 4.2 × 10−13), the complement system (p = 5.8 × 10−12), the acute phase response signalling (p = 2.2 × 10−11), the coagulation system (2.4 × 10−11) and the farnesoid X receptor/retinoid X receptor (FXR/RXR) activation pathway showed up as the top 5 pathways (top 10 pathways in Fig. 4B). In these pathways, the majority of the proteins were quantified in a lower amount in the cancer cohort. Only in the 6th most enriched pathway, cell cycle control chromosomal replication (p = 3.4 × 10−9), most of the proteins were found in a higher abundance in the cancer cohort.
![]() | ||
Fig. 4 Biological Interpretation. (A) Empirical clustering of the data and heatmap visualization (based on intensities). (B) Top10 pathways of IPA analysis. Downregulated proteins were labelled blue (dark blue: Q value <0.01) and upregulated proteins were labelled orange (dark orange: Q value <0.01). White bars show the percentage of proteins in the pathway that were not identified. (C) Overview of complement system from the interpretation of the quantitative data in IPA (Qiagen). Bar chart depicts the quantification data for the proteins in this pathway. Stars indicate proteins that were quantified with a Q value <0.01. The shapes of the proteins in the pathway indicate different protein classes. (D) Overview of cell cycle control of chromosomal replication exported from IPA, including the quantitative data similar to Fig. 4C. |
The combined down-regulation of the LXR/RXR pathway and acute phase response signalling pathway was widely observed in proteomic studies for various cancer types, e.g. for colon adenocarcinomas,51 for triple-negative breast cancer (in combination with a downregulation of the FXR/RXR activation pathway,52 which was also found here) and in the urine of prostate cancer patients.53 LXR acts as sensor for cholesterol homeostasis and in normal cells the pathway is activated by high intracellular cholesterol concentrations to reduce synthesis and influx and enhance cholesterol efflux. Therefore, this pathway is typically downregulated in cancer cells to accumulate high intracellular cholesterol concentrations which are required to sustain a high growth rate.54 Interestingly, the cholesterol biosynthesis proteins were quantified in a lower amount in the cancer (average log2ratio = −2.1, Q value between 0.01 and 9.1 × 10−37). Additionally, we found the low-density lipoprotein receptor (LDLR), the major extracellular cholesterol capture protein, in a lower abundance in the cancer cohort (log
2ratio = −3, Q value = 9 × 10−9). Cholesterol plays also an important role in apoptosis. Several death receptors are located in cholesterol-rich lipid rafts which trigger an apoptotic signal upon activation. Cancer cells can avoid apoptosis by modulating the composition of the lipid rafts leading to disruption of these receptors.55,56 Our data indicated by a lower expression level of two important classes of death receptors, the tumor necrosis factor receptor superfamily member 6 (FAS, log
2ratio = −1.2, Q value = 2 × 10−9) and the tumor necrosis factor receptor superfamily member 10B (TNFRSF10B, log
2ratio = −1.8, Q value = 0.03), that the testis cancer cells might use this strategy to avoid apoptosis.
It has also been shown that an activation of LXR leads to a cell-cycle arrest through downregulation of the S phase-associated kinase protein-2 (SKP2).57 In accordance with the downregulation of the LXR/RXR pathway in our dataset, SKP2 was quantified in an increased amount in the cancer samples (log2ratio = +1.4, Q value = 0.02). Because of the involvement of LXR in several cancer-related adaptations, an activation of the pathway is under investigation to facilitate cancer treatment in e.g. colon,58 prostate59 or gastric cancer.60
The second most significantly enriched pathway was the complement system (Fig. 4C). Almost all proteins of the pathway were quantified in decreased levels in the cancer samples (average log2ratio = −1.9, Q values between 0.2 and 4.6 × 10−173) with two exceptions: Integrin beta-2 (ITGB2, log
2ratio = +2.1, Q value = 1 × 10−18) and the complement component 1 Q subcomponent-binding protein (C1QBP, log
2ratio = +1, Q value = 2 × 10−5). Especially the finding of the elevated level of C1QBP was interesting because it acts as an inhibitor of the complement system and previous studies showed that cancer cells inhibit the complement system to escape immune response.61,62 Therefore, C1QBP was discussed as potential target for therapeutics.63
Cell cycle control chromosomal replication was the most enriched pathway (p = 3.4 × 10−9), in which most proteins were quantified in elevated levels in the cancer cohort (Fig. 4D, average log2ratio = +1.4, Q value between 0.03 and 3 × 10−52). This finding was to be expected as DNA replication is an important step in cell proliferation and a dysregulated cell cycle was described as one of the hallmarks of cancer.64 Therefore, cell cycle regulators are considered good targets in cancer therapy.65 Of special interest for therapy are the cyclin dependent kinases (CDKs). Promising results were achieved for CDK4 and CDK4 inhibitors,65 which both were not quantified in altered abundances in our study (CDK4: log
2ratio = −0.5, Q value = 0.01; CDK6: log
2ratio = 0, Q value = 0.08). Even though, our approach did not allow to measure the activity of these kinases, it indicated that the kinases would not be good targets in testis cancer. In contrast, CDK1 and CDK2 were quantified in significantly higher amounts in the cancer cohort (CDK1: log
2ratio = +1.3, Q value = 3.3 × 10−12; CDK2: log
2ratio = +1.1, Q value = 5 × 10−9). Thus, our data indicated that inhibition of CDK1 and CDK2 are potentially better targets for testis cancer treatment. Both kinases were already investigated as potential targets66,67 in cancer therapy. Additionally, several proteins of the cell cycle were linked to poor cancer prognosis, e.g. an increased expression of minichromosome maintenance proteins (MCMs, average log
2ratio in our dataset = +1.8, Q value between 2.3 × 10−30 and 3 × 10−52) in breast cancer68 or colon cancer.69
Replication stress is regarded as one reason leading to genomic instability in cells and ultimately to the development of cancer. During replication stress the DNA replication fork progression slows down or stalls in S phase due to a high transcriptional activity leading to DNA double-strand breaks.70 Interestingly, we found besides an increase of the proteins involved in replication, suggesting an increased replication rate, that also the mismatch repair pathway was enriched in the IPA analysis (Fig. S6A, ESI,†p value = 6 × 10−7; average log2ratio = +1.3, Q values between 1.6 × 10−10 and 2.1 × 10−71). To support the hypothesis that this finding might be related to replication stress, we investigated the expression levels of the DNA-directed RNA polymerases, which are often dysregulated in cancer.71 The expression levels of the RNA polymerases were actually increased by an average log
2ratio of +1; especially the proteins of the RNA polymerase I complex (average log
2ratio = 1.2; Q values between 7.9 × 10−3 and 1.4 × 10−32; Fig. S6B, ESI†). A previous study linked the replication stress to an elevated expression of the general transcription factor TATA-box binding protein TBP.72 The expression level of TBP was also increased (log
2ratio = +1, Q value = 1.2 × 10−4) in the cancer cohort indicating a potential role of TBP in replication stress in testis cancer.
The potential of the comprehensive single-shot proteome analysis workflow was exemplified on a small testis cancer study with a coverage of 11200 proteins. The high coverage of the testis cancer proteome enabled an in-depth analysis of several hallmarks of cancer like evasion of immune responses by downregulation of the complement system or a dysregulation of the cell cycle. These data showed how these pathways could be investigated down to the level of transcription factors (104 transcription factors were quantified based on protein description). This analysis demonstrates that deep proteome analysis paves the way to a better understanding of underlying molecular mechanisms of disease and helps to identify potential targets for disease treatment (339 of 516 described human kinases, according to https://www.uniprot.org/docs/pkinfam, were quantified as potential drug targets).
In the future, we imagine that additional ion separation devices like FAIMS13 or TIMS14 have the potential to further increase the depth of single-shot proteome analysis and/or to achieve such a high coverage with shorter gradients.
Cells/tissue samples were lysed in lysis buffer (8 M urea, 0.1 M ammonium bicarbonate) using the TissueLyzer II (Qiagen, Heidelberg, Germany) with following settings: 3 cycles, 30 beats per s, 30 s. DNA was sheared using sonication in the Bioruptor (Diagenode, Seraing, Belgium) using following settings: 5 cycles, 30 s ON, 30 s OFF, 4 °C, high intensity. After clearing of the lysates by centrifugation (20 min, 16000 × g, room temperature), aliquots were reduced by parallel treatment with 10 mM tris(2-carboxyethyl)phosphine (TCEP) and 40 mM 2-chloroacetamide (CAA) for 1 h at 37 °C. Afterwards the urea concentration was lowered to 1.5 M by addition of 0.1 M ammonium bicarbonate buffer and digested by trypsin (1 to 100 ratio, Promega, Madison, WI) over night at 37 °C. Peptides were purified using MacroSpin clean-up columns (NEST group, Southborough, MA) following manufacturers protocol. Eluates were dried completely in a speed-vac (Savant SPD131DDA, Thermo Fisher Scientific, San Jose, CA). The samples were resuspended in buffer A (1% acetonitrile, 0.1% formic acid in water) containing iRT peptides (Biognosys, Schlieren, Switzerland). Peptide concentration were determined using nano-drop (Spectrostar Nano, BMG labtech, Ortenberg, Germany) and adjusted to 1 μg μl−1.
Afterwards the non-linear gradient was ramped in steps of 2 h up to 8 h at a flow rate of 250 nl min−1 on the CSH column. For the chromatography optimization, the mass spectrometer was operated in data-independent acquisition (DIA) mode using following parameter for the MS1 scan: scan range: 350 to 1650 Th; AGC target: 3e6; max injection time: 20 ms; scan resolution: 120000. The MS1 was followed by DIA scan events with following settings: AGC target: 3e6; max injection time: 55 ms; scan resolution: 30
000; first fixed mass: 200 Th; stepped normalized collision energy: 25.5, 27, 30. The number of DIA windows and the window widths were adjusted to the precursor density and to achieve 4–5 datapoints per peak for each experiment separately. The windows overlapped by 0.5 Th (window design: Table S1, ESI†). For the 6 h and 8 h method, the resolution of the DIA scan was increased to 60
000. For a better calculation of the AGC target by the mass spectrometer two MS1 scans were acquired for the 6 h method (after half of the DIA scans) and three MS1 scans for the 8 h method (after a third of the DIA scans). The scan ranges of these additional scans were adjusted to the MS1 range covered by the subsequent DIA scans: for the 6 h method: 1st MS1 scan: 350 to 620 Th, 2nd MS1 scan: 600 to 1650 Th and for the 8 h method: 1st MS1 scan: 350 to 540 Th, 2nd MS1 scan: 530 to 720 Th, 3rd MS1 scan: 710 to 1650 Th.
For the testis dataset, 4 μg of the testis digests were injected (2/3 μg h−1), and the samples were analyzed by the 6 h gradient at a flow rate of 250 nl min−1 on the 50 cm CSH column using the above described 6 h DIA method.
To generate the DDA data for the library, the same LC-MS setup as described for the DIA acquisition was operated in data-dependent Top20 mode. Peptides were separated by a non-linear 3 h gradient on the 50 cm CSH column. Following settings for the Q Exactive HF-X mass spectrometer were applied: MS1 scan resolution: 60000; MS1 AGC target: 3e6; MS1 maximum IT: 20 ms; MS1 scan range: 350–1650 Th; MS2 scan resolution: 30
000; MS2 AGC target: 1e6; MS2 maximum IT: 55 ms; isolation window: 4 Th; first fixed mass: 200 Th; NCE: 27; minimum AGC target: 1e3; only charge states 2 to 4 considered; peptide match: preferred; exclude isotopes: on; dynamic exclusion: 30 s.
For the project library, the DDA files from the fractionated samples were searched in SpectroMine with the same settings mentioned above. Additionally, the DIA runs from the testis cancer set were searched with SpectroMine in the same way. To generate the hybrid libraries, the “generate library from search archive” option with the default settings was used to create the project hybrid library (combination of project library and DIA search) and resource hybrid library (combination of resource library and DIA search). The generation of hybrid libraries from search archives in SpectroMine has three advantages: (1) it avoids having to re-search the raw data (2) it enables homogeneous protein inference (3) and it guarantees a peptide and protein FDR of 1% on the complete data set.74
The LC-MS data, libraries, results tables and Spectronaut projects of the testis analysis have been deposited to the ProteomeXchange Consortium76 (http://proteomecentral.proteomexchange.org) via the PRIDE77 partner repository with the dataset identifier PXD013658. The Spectronaut projects can be viewed using the free Spectronaut viewer (http://www.biognosys.com/technology/spectronaut-viewer).
The htrms files were analyzed with Spectronaut X17 (version: 12.0.20491.18.30559, Biognosys) using the previously generated libraries and default settings.
For the analysis of the DIA data of the comparison of the different solid phases and the gradient ramp, the DirectDIA workflow was used (based on raw files). The workflow allowed a search of the DIA data against a FASTA file and quantification of the precursors, peptides and proteins. The principles were described by Tsou et al.28 The same search parameters as for the search of the DDA data in SpectroMine were applied and the results were filtered by a 1% FDR on precursor and protein level (Q value <0.01). These files were also searched using the generated HeLa library. To minimize effects of iRT precision on the identification result,45 the XIC extraction window was set to full for the comparison of the different solid phases. For all other analysis it was set to dynamic. The results of the DIA analysis were filtered within Spectronaut by 1% FDR on peptide and protein level using a target-decoy approach, which corresponds to a Q value ≤0.01.24,26,34,35 The decoy generation was done using a mutated decoys approach and protein FDR was calculated using an adapted version of Rosenberger and colleagues.35 Both strategies, as implemented in Spectronaut, were further described previously.26 For the testis cancer sample set, the quantification data were filtered with the Q value percentile filter set to 0.5. (outputs from Spectronaut can be found in the ESI;† Table S2: solid phase comparison, Table S3: gradient ramp, Table S4: hybrid library approach for HeLa sample, Table S5: library comparison for testis sample, Table S6: testis cancer set).
Footnotes |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/c9mo00082h |
‡ Contributed equally. |
This journal is © The Royal Society of Chemistry 2019 |