Areti-Maria
Vasilogianni‡
ab,
Sarah
Alrubia‡
ac,
Eman
El-Khateeb‡
ade,
Zubida M.
Al-Majdoub
a,
Narciso
Couto
a,
Brahim
Achour
af,
Amin
Rostami-Hodjegan
ae and
Jill
Barber
*a
aCentre for Applied Pharmacokinetic Research, School of Health Sciences, University of Manchester, Stopford Building, Oxford Road, Manchester, M13 9PT, UK. E-mail: Jill.Barber@manchester.ac.uk
bDMPK, Oncology R&D, AstraZeneca, Cambridge, UK
cPharmaceutical Chemistry Department, College of Pharmacy, King Saud University, Riyadh, Saudi Arabia
dClinical Pharmacy Department, Faculty of Pharmacy, Tanta University, Tanta, Egypt
eCertara Inc (Simcyp Division), 1 Concourse Way, Sheffield, UK
fDepartment of Biomedical and Pharmaceutical Sciences, College of Pharmacy, University of Rhode Island, Kingston, Rhode Island, USA
First published on 13th November 2023
Several software packages are available for the analysis of proteomic LC-MS/MS data, including commercial (e.g. Mascot/Progenesis LC-MS) and open access software (e.g. MaxQuant). In this study, Progenesis and MaxQuant were used to analyse the same data set from human liver microsomes (n = 23). Comparison focussed on the total number of peptides and proteins identified by the two packages. For the peptides exclusively identified by each software package, distribution of peptide length, hydrophobicity, molecular weight, isoelectric point and score were compared. Using standard cut-off peptide scores, we found an average of only 65% overlap in detected peptides, with surprisingly little consistency in the characteristics of peptides exclusively detected by each package. Generally, MaxQuant detected more peptides than Progenesis, and the additional peptides were longer and had relatively lower scores. Progenesis-specific peptides tended to be more hydrophilic and basic relative to peptides detected only by MaxQuant. At the protein level, we focussed on drug-metabolising enzymes (DMEs) and transporters, by comparing the number of unique peptides detected by the two packages for these specific proteins of interest, and their abundance. The abundance of DMEs and SLC transporters showed good correlation between the two software tools, but ABC showed less consistency. In conclusion, in order to maximise the use of MS datasets, we recommend processing with more than one software package. Together, Progenesis and MaxQuant provided excellent coverage, with a core of common peptides identified in a very robust way.
Drug-metabolizing enzymes (DMEs), such as cytochrome P450 (CYP450)11–19 and uridine 5′-diphospho-glucuronosyltransferase (UGT) enzymes, have received particular attention owing to their role in determining the kinetics of the majority of drugs on the market.20 However, even with recent advances in technology, measuring UGT enzymes remains challenging because of their membrane topology and high sequence homology.21 Similarly, transporters are difficult to quantify because of low abundance and membrane localization, and therefore their characterization requires enrichment of plasma membrane fractions and the use of highly sensitive instrumentation.22
The increased activity in this area, however, has highlighted inter-laboratory and inter-methodological variation in quantification.6 There is no simple relationship between the size of a mass spectrometry signal and the concentration of analyte. Worse, the LC-MS/MS workflow does not normally sample every available peptide but selects the most intense signals at any time point. Quantification of DMEs and transporters is important – it provides numbers used in silico to represent patients in virtual clinical trials.23–25 The community therefore assembled in September 2018 to address best practice in proteomic analysis and quantification methods, resulting in a white paper.26
Differences in quantification can arise from differences in sample preparation,27,28 quantification methodology,28,29 including whether measurement is targeted or untargeted,6,30,31 LC-MS/MS parameters and instrumentation, even when the sample is the same. In practice, we are not especially interested in measuring the same sample because biological differences between samples are the main subject of our investigations. Multivariate statistical techniques, such as principal components analysis (PCA), have been used to discern biological and technical variation within groups of samples27,32,33 but are of limited utility in assessing cross-laboratory measurements.
At this stage, strategies for overcoming these differences would inevitably involve many replicate analyses, which are at best costly, and often impossible where samples are small and of human origin. There is, however, less excuse for differences in quantification resulting from data analysis. The commonly used data analysis tools, required to convert RAW data files into quantification of proteins, have different algorithms that can generate variable results, and one useful idea is to assess their complementarity. Comparative reports for different data analysis tools have been generated (Table 1) with varied conclusions. A single 2012 study sought to compare data processing using complex samples from animal retinas, concluding that the total number of proteins identified by MaxQuant and Progenesis is highly comparable, with 74% overlap.34 Another study using five different data analysis tools to identify potato and human synthetic peptides concluded that MaxQuant achieved the highest peptide coverage based on charge-state merging, while Progenesis was the best based on the obtained original data, as a result of all alignment features and normalization before LC-MS/MS.35 Comparison of different tools using a plant-derived standard proteins mix demonstrated high variability in protein abundance measured by the different tools, suggesting caution should be applied with discovery proteomics data.34 Finally, a study using Universal Proteomics Standard Set and yeast concluded that Progenesis performed consistently well in differential expression analysis and produced few missing intensity values, whereas data filtering or imputation methods improved the performance of commonly used software for proteomics including MaxQuant, Proteios, PEAKS, and OpenMS.7
Study | Sample | Compared software | Analysis technique (instrument) | Outcomes compared |
---|---|---|---|---|
DDA: data dependent acquisition; SRM: selected reaction monitoring; SILAC: stable isotope labelling by amino acids in cell culture. | ||||
Merl et al. 201242 | Retinal cells (healthy animals) | Progenesis | Label free versus SILAC (Orbitrap) | Quantification accuracy |
MaxQuant | Dynamic range | |||
Sensitivity | ||||
Chawade et al. 201543 | Synthetic peptides (potato and human) | Progenesis | DDA and SRM (Orbitrap XL ETD) | Peptide coverage |
MaxQuant | F1-score (harmonic mean of precision and sensitivity) | |||
Proteios | Mean accuracy (proportion of true positive and negative identifications) | |||
Skyline | Number of unique peptides | |||
Anubis | ||||
Välikangas et al. 201744 | Universal proteomics standard set and yeast Saccharomyces cerevisiae | Progenesis | DDA (Orbitrab Velos) | The number of proteins quantified. |
MaxQuant | The extent of missing data | |||
Proteios | ||||
PEAKS | ||||
OpenMS | ||||
Al Shweiki et al. 201745 | Standard proteins mix (plant) | Proteome Discoverer | DDA (Orbitrap Velos) | Biological variability |
Scaffold | Protein abundance estimates | |||
MaxQuant | Protein fold change | |||
Progenesis |
In the present work, we analysed a real, clinically important dataset obtained from 23 human liver membrane samples. We used two software packages, MaxQuant and Progenesis, both commonly used for peptide/protein identification and quantification. MaxQuant36,37 uses its own search engine, Andromeda, for identification, which relies on a probability calculation for scoring a peptide-spectrum match.38 Quantification of proteins is based on maximum peptide ratio information from extracted peptide ion signal intensities. These are normalised to minimise the overall fold change of all peptides across all fractions.34 Progenesis uses Mascot for identification39 and quantifies proteins based on peptide ion peak intensity while allowing full operator control.34
The novelty of this study is that MaxQuant and Progenesis are evaluated for first time using healthy human liver samples from healthy volunteers and focusing on drug-metabolising enzymes and transporters. Human liver samples from healthy volunteers are very precious and very important as controls. Because of their rarity, several studies use ‘histologically normal’ livers from diseased patients as controls. However, our previous reports showed that livers from diseased subjects are different from healthy controls and therefore they are not ideal as controls.40,41 There is a particular ethical imperative therefore to generate as much information as possible from these very precious samples. This dataset was used to evaluate MaxQuant and Progenesis and to determine whether information could be maximised by the use of both software tools with a single dataset. We focused particularly on drug-metabolising enzymes and transporters because the liver is the primary site of drug metabolism in the body. Perturbations in the abundance of these proteins can therefore affect the toxicity and efficacy of drugs.
The parameters applied in MaxQuant were changed from default to match their counterparts in Progenesis and Mascot as presented in Table 2. Full details of all the parameter settings used for MaxQuant are listed in Table S1 (ESI†). No filters were applied for the scores in data processing and cut-off scores were applied manually after exporting the data. The ‘matching between runs’ feature was not enabled in MaxQuant.
Parameter description | Parameter setting |
---|---|
Label free quantification | Yes |
Multiplicity | 1 |
Digestion enzyme | Trypsin/P |
Variable modifications | Oxidation (M) & deamidation (NQ) |
Fixed modifications | Carbamidomethyl (C) |
Max number of modifications per peptide | 11 |
Max charge | 7 |
Main search peptide tolerance | 5 ppm |
Min pep length | 7 |
Min pep length for unspecific | 70 |
Max peptide mass [Da] | 6000 Da |
Peptides for quantification | Unique + razor |
MS/MS match tolerance | 0.5 Da |
False discovery rate (FDR) | 1% |
• All peptides were matched against the UniProt human proteome fasta file (May 2017).49 Proteins were prioritised according to the following criteria: (a) full length proteins were preferred over cDNA; (b) characterised sequences were prioritised over uncharacterised ones; and (c) longer sequences of the same proteins were preferred over shorter ones. The final order was arranged alphabetically.
• The remaining peptides that did not match any protein were deleted. Single peptides that appeared in two or fewer samples and did not appear in the UniProt fasta file were also deleted.
• A best-fit analysis was then run to minimise the number of accession codes that account for all the peptides.
For each sample, the number of proteins identified with at least one unique or razor peptide by each software package was determined. The number of CYP450s, UGTs, ABC and SLC transporters were calculated separately. Percentage identical proteins (PIPr) was calculated for all pairs of results, both inter- and intra-sample.
The raw files were processed by MaxQuant on personal computers that have the following specifications: Processor Intel® Core™ i7-6600U CPU@2.6 GHz; RAM 20 GB; 64-bit operating system; Windows 10. The computer used for Progenesis processing has the following specifications: Dell Precision T7600 Tower workstation; Processor 2x Intel Xeon-E5-2643 CPU@3.30 GHz; RAM 128 GB; 64-bit operating system; Windows 7.
Progenesis score = 0.3508 × MaxQuant score | (1) |
Sample | MaxQuant total peptides | Progenesis total peptides | MaxQuant only peptides | Progenesis only peptides | Overlap | MaxQuant modified | Progenesis modified | |||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Number | Percent | Number | Percent | Number | Percent | Number | Percent | Number | Percent | |||
HLM01 | 15![]() |
11![]() |
3280 | 18% | 2193 | 12% | 12![]() |
70% | 673 | 4% | 1337 | 9% |
HLM02 | 20![]() |
14![]() |
7313 | 33% | 2022 | 9% | 12![]() |
58% | 1182 | 6% | 1397 | 9% |
HLM06 | 18![]() |
14![]() |
6092 | 30% | 2186 | 11% | 12![]() |
60% | 892 | 5% | 1405 | 10% |
HLM08 | 17![]() |
15![]() |
3906 | 20% | 2219 | 11% | 13![]() |
69% | 789 | 4% | 1267 | 8% |
HLM11 | 17![]() |
15![]() |
3931 | 20% | 2369 | 12% | 13![]() |
68% | 833 | 5% | 1507 | 9% |
HLM25 | 18![]() |
15![]() |
5407 | 26% | 2469 | 12% | 13![]() |
62% | 961 | 5% | 1553 | 10% |
HLM38 | 17![]() |
16![]() |
3735 | 19% | 2381 | 12% | 13![]() |
69% | 861 | 5% | 1461 | 9% |
HLM41 | 17![]() |
14![]() |
5076 | 26% | 2330 | 12% | 12![]() |
62% | 1338 | 8% | 1828 | 13% |
HLM48 | 17![]() |
15![]() |
4265 | 22% | 2108 | 11% | 13![]() |
67% | 910 | 5% | 1343 | 9% |
HLM71 | 16![]() |
15![]() |
2746 | 15% | 2378 | 13% | 13![]() |
72% | 786 | 5% | 1522 | 10% |
HLM72 | 16![]() |
15![]() |
3637 | 19% | 2160 | 11% | 13![]() |
70% | 829 | 5% | 1425 | 9% |
HLM73 | 16![]() |
15![]() |
3707 | 20% | 2205 | 12% | 13![]() |
69% | 2921 | 17% | 3137 | 21% |
HLM74 | 17![]() |
14![]() |
5691 | 28% | 2215 | 11% | 12![]() |
61% | 827 | 5% | 1391 | 10% |
HLM75 | 16![]() |
15![]() |
3537 | 19% | 2148 | 12% | 12![]() |
69% | 752 | 5% | 1388 | 9% |
HLM76 | 18![]() |
14![]() |
6447 | 31% | 2157 | 10% | 12![]() |
59% | 1006 | 5% | 1431 | 10% |
HLM77 | 16![]() |
14![]() |
3872 | 21% | 1917 | 10% | 12![]() |
69% | 907 | 5% | 1285 | 9% |
HLM78 | 18![]() |
14![]() |
6294 | 31% | 2189 | 11% | 12![]() |
59% | 987 | 5% | 1457 | 10% |
HLM80 | 16![]() |
15![]() |
4037 | 21% | 2194 | 12% | 12![]() |
67% | 758 | 4% | 1365 | 9% |
HLM89 | 17![]() |
15![]() |
3827 | 19% | 2389 | 12% | 13![]() |
69% | 1492 | 9% | 2004 | 13% |
HLM90 | 17![]() |
12![]() |
4009 | 20% | 2731 | 14% | 13![]() |
66% | 931 | 5% | 1140 | 9% |
HLM91 | 16![]() |
15![]() |
4045 | 21% | 2344 | 12% | 12![]() |
67% | 790 | 5% | 1399 | 9% |
HLM100 | 17![]() |
15![]() |
4334 | 22% | 2278 | 11% | 13![]() |
67% | 854 | 5% | 1388 | 9% |
HLM117 | 19![]() |
14![]() |
6998 | 32% | 2347 | 11% | 12![]() |
57% | 921 | 5% | 1517 | 10% |
Mean |
17![]() |
14![]() |
4617 | 23% | 2258 | 11% |
12![]() |
65% | 1009 | 6% | 1519 | 10% |
SD | 1027 | 1102 | 1273 | 5% | 165 | 1% | 495 | 5% | 458 | 3% | 394 | 3% |
CV | 6% | 7% | 28% | 23% | 7% | 8% | 4% | 7% | 45% | 47% | 26% | 26% |
Sample HLM73 (and to a lesser extent HLM41) is an interesting case, with much higher levels of modification than the norm, identified by both software packages. It is not clear whether the high level of modification is the result of technical differences in handling the samples, or biological differences (for example in response to ageing).
Progenesis intensity = 0.0149 × MaxQuant intensity | (2) |
Fig. 3(C) shows GRAVY scores for MaxQuant-specific and Progenesis-specific peptides; the more negative the value, the more hydrophilic the peptide. The median and mode GRAVY scores of the MaxQuant specific peptides in all samples ranged from −0.35 to 0.09 and from −0.7 to 0.4, respectively, whereas median and mode GRAVY scores of the Progenesis-specific peptides were ranging from −0.53 to −0.43 and from −0.9 to 0.1 (Table S8, ESI†). Therefore, the peptides identified by Progenesis (Fig. 3(C)) had more negative GRAVY scores, indicating higher hydrophilicity than those identified solely by MaxQuant (Fig. 3(C)).
Table S5 (ESI†) provides an example of statistical analysis in relation to the peptide length, GRAVY score (hydrophobicity), isoelectric point (PI), and molecular weight of peptides from sample HLM76. Comparison of these characteristics showed that Progenesis-specific peptides were generally shorter, more hydrophilic, and more basic, with lower mass.
![]() | ||
Fig. 5 The number of samples in which CYPs and UGTs identified by each software tool. Other CYPs and UGTs that have been identified by both software (overlap) in all samples are not included. |
Protein | Samples with reliable detection by: | Comments | |
---|---|---|---|
Pro-genesis | Max-Quant | ||
CYP1A1 | 7 | 2 | Involved in steroid hormone biosynthesis,52 fatty acid,53 and retinol metabolism.54 |
CYP39A1 | 13 | 19 | Involved in cholesterol degradation and bile acid biosynthesis.55 |
CYP2A7 | 18 | 5 | |
CYP2F1 | 22 | 0 | Possibly involved in the metabolism of naphthalene.56 |
CYP4F8 | 11 | 23 | Involved in fatty acid metabolism.57 |
CYP4F22 | 16 | 15 | Autosomal recessive loss of function mutations associated with congenital ichthyosiform erythroderma.58,59 |
CYP2J2 | 17 | 21 | Involved in arachidonate metabolism60 |
CYP2S1 | 10 | 15 | Involved in fatty acid metabolism.61 |
ABCA1 (ABC-1) | 16 | 22 | Involved in the transport of cholesterol and high-density lipoproteins.62 Mutations lead to Tangier disease.63 |
ABCA2 (ABC2) | 4 | 16 | Associated with drug resistance in cancer cells, and one SNP of ABCA2 is linked to early onset of Alzheimer's disease.64 |
ABCB5 (ABCB5 P-gp) | 4 | 2 | Associated with drug resistance in colorectal cancer and melanoma.65,66 |
ABCC2 (MRP2) | 12 | 22 | Mutations are associated with Dubin–Johnson syndrome.67 |
ABCD4 (PMP70) | 16 | 23 | Involved in vitamin B12 transport.68 |
SLC2A1 (GLUT-1) | 10 | 5 | Involved in glucose transport and when mutated, associated with GLUT1 deficiency syndrome.69 |
SLC29A1 (ENT1) | 12 | 17 | Mutations are associated with inherited H syndrome, pigmented hypertrichosis with insulin-dependent diabetes, and Faisalabad histiocytosis.70 |
SLC29A3 (ENT3) | 11 | 5 | Mutations associated with disorders, such as H syndrome, pigmented hypertrichotic dermatosis with insulin-dependent diabetes syndrome, and histiocytosis with massive lymphadenopathy.71 |
SLC22A7 (OAT2) | 14 | 20 | Acts as sodium-independent organic anion/dimethyldicarboxylate exchanger.72 |
Additionally, all the identified CYPs, UGTs, ABC and SLC transporters were quantified using TPA. Fig. 8 illustrates the correlation of the abundance of these proteins between MaxQuant and Progenesis. The more abundant proteins, CYPs, UGTs and SLC transporters show good correlation, clustering around lines of y = x as expected. The ABC transporters, with the exceptions of ABCD3 and MRP3 (not shown on the graph), are, however, of very low abundance, close to the limit of detection and are poorly enriched in microsomes compared with endoplasmic reticulum proteins, such as CYPs and UGTs. Fig. 8(C) and Table S9 (ESI†) now show much more scatter from y = x. This is not surprising. In general, the biases that lead to Progenesis favouring short, basic, hydrophilic peptides and MaxQuant favouring longer, hydrophobic, more acidic peptides cancel extremely well for abundant proteins with many detectable peptides, leading to consistent quantification, despite the differences in detected peptides. For low abundance proteins, such as ABC transporters, many peptides fall below the scoring threshold for at least one of the packages, leading to bigger discrepancies. The precision of quantification is poor, and it is not possible to judge which package is better for any particular protein. The advantage of analysing data with both packages is that it allows us to confirm the presence of more proteins than we could detect with a single package. However, quantification of low abundance proteins is perilous, and several criteria must be taken in consideration, including the number of peptides corresponding to that protein identified by each software, the uniqueness of the peptides, their quality (modifications, missed cleavages), the number of samples where the protein was identified, and, where possible, quantification with different methods (TPA, HiN, QconCATs, iBAQ, etc.).
There have been a relatively small number of studies devoted to understanding the role of the processing package in interpreting global proteomic data and many of these focus on quite simple model systems, such as yeast and plants.73,74 The real importance of differences in processing will only be apparent when different packages are used to process clinical samples, especially precious human samples where sample availability is limited and where the proteins under study are of low abundance, membrane bound, or show high homology and therefore yield few unique peptides.
Duplicate MS output files, generated from duplicate tryptic digests of 23 human liver samples were processed by two different software packages, Progenesis and MaxQuant. Peptide score correlation obtained for each sample by the two software tools was performed and an average trend line was created to establish a score cut-off equivalent to a MaxQuant score of 40. A comparison between the remaining sets of peptides was performed. The overlap between the peptides detected by the two packages ranged from 52–72% (mean 65%) with the total number of peptides identified by MaxQuant typically 18% higher. Progenesis, on average detected more modified peptides (10% compared to 6% for MaxQuant). A comparison of the characteristics of the software-specific peptides showed that, in general, Progenesis identified shorter peptides than MaxQuant, and they tended to be more basic and more hydrophilic.
We used consistent parameters for both software tools (mass tolerance, enzyme specificity, missed cleavages and modifications) and both search engines use a peptide score to match the experimental MS/MS data with a theoretical spectrum. The scoring of the peptide-spectrum match (PSM) by both tools is based on a probability calculation. The more recently developed Andromeda (MaxQuant) tool bases the scores on a binomial distribution probability, taking into account peptide fragments, neutral losses (water, ammonia) and diagnostic peaks.38,75 Mascot (Progenesis) scoring uses peptide fragments for spectral correlation with a probabilistic modelling approach and applies an ion score cut-off to filter the PSMs.76 Although the scoring systems seem very similar, the processes necessary for assigning a PSM can yield different outcomes because the algorithms used for peak picking and subsequent peptide sequencing differ between search engines.77 False positive PSMs present a challenge, as the false peptide/protein identification interferes with the interpretation of the data. Therefore, ways to measure and control the number of false identifications are required. These measures discriminate correct PSMs from false identifications and ultimately allow controlling the false discovery rate (FDR).78
The scoring algorithms aim to describe the match quality, for instance, the number of shared fragment ions between a spectrum and a candidate peptide sequence39 or similarity in general. In the case of Mascot/Andromeda the number of shared fragment ions is converted into a probabilistic match score using the negative logarithm of the determined probability that the computed PSM is an incorrect assignment.38 This generates a measure of match quality with high scores representing more likely hits and a high proportion of matching fragment ions. An expectation value is calculated for all sequence candidates based on the score distribution. Low quality peaks can either be used for scoring or filtered out by the search engine, leading to differences in the quality of the PSMs. Matches of medium to high quality spectra tend to be scored robustly by the two software, leading to the observed significant overlap.
For the purpose of comparison in this study, the score cut-off values were normalised based on a predefined cut-off score of 40 for MaxQuant. An equivalent value was determined for Mascot (ranging from 11.9 to 16.5). This finding is in agreement with the literature, which reported that MaxQuant score is about three times Mascot score.38 The cut-off values of ≥40 for MaxQuant and ≥20 for Mascot were reported to offer a high identification probability in proteomics.74,79 Higher score was associated with unmodified peptides, with a clear indication of higher confidence in unmodified peptide identification across the 23 analysed samples; the average proportion of unmodified peptides associated with scores ≥40 for MaxQuant and Mascot was 94% and 90%, respectively. This is in line with a previous assessment reporting 89.1% unmodified peptides (in mouse dendritic cells).38
Searching software algorithms and comparing the data based on the algorithm of the compared software tools is generally beyond the scope of this paper. Our aim is not to find the element of the algorithm that may lead to differences in the identification of peptides and quantification of proteins between the two software tools. Instead, we aim to identify the differences between performance of the two software tools in terms of the number, nature and identity of identified peptides, and quantity of clinically important proteins. This has been achieved by keeping the setup parameters consistent between the two software tools.
At the protein level, our comparison focused on hepatic drug-metabolising enzymes and transporters involved in drug metabolism and disposition. There is considerable inter-individual variability in the expression of these proteins, and this results in different efficacy and toxicity of drugs among different patients.80 The distribution and abundances of these proteins can be used for the prediction of the pharmacokinetics of drugs in pharmacologically based pharmacokinetics models. More specifically, they can be used as scaling factors for the in vitro to in vivo extrapolation of drug clearance.23 Most hepatic drug-metabolising enzymes identified herein are of high abundances. This is because the samples are enriched microsomal fractions which are the main fractions harbouring these proteins within the hepatocyte. Identification of proteins of interest require additional rigour to establish confidence in their identification using unique peptides for this specific protein as explained in the Methods (Section 2.5).
In most of the samples, unique peptides corresponding to CYP and UGT proteins were detected by both software tools; in general, Progenesis and MaxQuant identified similar numbers of CYP and UGT peptides (Chi-squared test, p > 0.05). There were some discrepancies, however, with the most interesting cases being CYP1A1, 2A7, 2F1, 4F8 and UGT1A7 (Table 4 and Fig. 5). These are important for the metabolism of steroids, pneumotoxicants, naphthalene, fatty acids, and many other endogenous and xenobiotic substances (Table 4).81
Transporters are generally expressed at very low levels and in the plasma membrane, rather than endoplasmic reticulum, so they are not well enriched in microsomal preparations. We have previously demonstrated that microsomes are a crude membrane fraction that comprises membranes from various intracellular compartments as well as the plasma membrane.30,82 Endoplasmic reticulum is highly enriched in microsomes while plasma membrane tends to be less enriched; enrichment factors are normally less than 2 fold for plasma membrane, whereas reticular proteins have higher enrichment (>5 fold)83 This is mainly because of different levels of loss of membrane protein; in-house data showed 50–80% recovery of reticular protein compared to 30–60% recovery of cell membrane protein.84 Microsomal crude membrane extracts are not perfect, but they are the best available enriched membrane preparation. Extracting purer fractions such as plasma membrane fractions is fraught with unmitigated levels of protein loss. Like UGTs, transporters are membrane embedded, and, like UGTs, they tended to be more readily detected by MaxQuant. However, count differences (Chi-squared statistics) showed non-significant differences. Table 4 and Fig. 5–7 show that in some cases, MaxQuant identifies more unique peptides for CYPs, UGTs and transporters, whereas in other cases the opposite trend is observed. Table 4 also illustrates how the peptides detected only by Progenesis (for example, GNGIAFSSGDRWK and KSPAFMPFSAGR from CYP2F1) tend to be hydrophilic and basic whereas those detected only in MaxQuant (for example, TLDFIDVLLLSEDKNGK and SVINTSDAITDK from CYP4F8) tend to be slightly longer, less hydrophilic and weak acids, in line with the characteristics preferred by MaxQuant compared to Progenesis. The ABC transporters’ dataset illustrates that any search conditions will inevitably lead to some loss of genuine peptides together with the noise. When this dataset was subjected to MaxQuant processing with deamidation not permitted, most of the peptides detected here only with Progenesis appeared.
The quantification of DMEs and transporters with both software tools indicated that there is a reassuring consistency in the quantification of CYPs, UGTs and SLC transporters between MaxQuant and Progenesis. However, this is not observed in the case of the low abundance ABC transporters. This finding indicates that in the case of low abundance proteins, it may be very useful to use both software tool in a complementary way to increase the information extracted from the data. This will allow for more proteins of low abundance to be quantified, at least approximately.
The PCA analysis of the data shown in Fig. 4 is gratifying. The two software packages are in broad agreement, especially with respect to inter-individual variability. For example, both packages agree that sample 75 is similar to 71, and 77 is similar to 89. Where they disagree, we have developed some understanding of the reasons. It is therefore possible carefully to augment the data obtained using a Progenesis single package30 with the additional data obtained here using MaxQuant.
Footnotes |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d3mo00144j |
‡ These authors contributed equally to the manuscript. |
This journal is © The Royal Society of Chemistry 2024 |