A report on the ESF workshop on quality control in proteomics

Lennart Martens ab
aDepartment of Medical Protein Research, VIB, B-9000 Ghent, Belgium
bDepartment of Biochemistry, Ghent University, B-9000 Ghent, Belgium. E-mail: lennart.martens@UGent.be; Fax: +32 9 264 94 84; Tel: +32 9 264 93 58


Abstract

Proteomics has shown great promise as a key tool in the analysis of living systems, providing high-throughput views on the expression and translation of genes into proteins, their abundances, and their modifications. Yet the rapid advances in the field have led to an emphasis on the development of novel methodologies, at the cost of establishing clear quality control metrics and acceptable operational parameters for many of these methods. Since proteomics is ultimately an analytical science however, it is essential for the future of the field to invest in quality control and quality assurance strategies for existing approaches as well. With the support of the European Science Foundation’s Frontiers in Functional Genomics programme, a workshop was therefore held in Cambridge, UK from 25 until 27 November 2009 that focused on Quality Control in Proteomics. A report on this workshop is provided here.



                  Lennart Martens

Lennart Martens

Dr Lennart Martens is Professor of Systems Biology in the Department of Medical Protein Research at Ghent University in Ghent, Belgium. He received his Doctorate in Sciences: Biotechnology from Ghent University in Belgium working on proteomics informatics. He then served as PRIDE Group Coordinator at EMBL-EBI in Cambridge, UK for several years before rejoining Ghent University in his current position. His main research interests are the development of enabling software for omics research, data mining of large-scale omics experiments, and developing integrative data analysis approaches for across-omics datasets. Prof. Martens recently organized the European Science Foundation sponsored Frontiers in Functional Genomics workshop on Quality Control in Proteomics in Cambridge, UK.


The field of proteomics has matured considerably over the past few years. Building on the availability of improved instrumentation,1 comprehensive sequence databases,2,3 novel methodologies,4 and automation through specialized algorithms,5 the field has matured into a high-throughput discipline, capable of identifying and even quantifying many hundreds to thousands of proteins.6 This rapid expansion of the field has led to a strong focus on new development, at the cost of establishing thorough quality control approaches and metrics for existing methods.7 Yet since proteomics is at its core an analytical discipline, full maturity of the field will eventually require robust quality control as well as quality assurance approaches. With this in mind, a workshop on quality control in proteomics was organized in Cambridge, UK from 25 until 27 November 2009, with the support of the European Science Foundation programme for Frontiers in Functional Genomics. The three-day workshop was attended by a diverse group of 37 people (Fig. 1), representing academic labs, core facilities, instrument and consumable vendors, biotechnology start-up companies, software development companies, and journal editorial boards. Importantly, the participants also reflected several existing groups that are working around this broader topic in proteomics, such as such as the Fixing Proteomics campaign (http://www.fixingproteomics.org), the Association of Biomolecular Research Facilities (ABRF, http://www.abrf.org), the Human Proteome Organisation (HUPO, http://www.hupo.org) Test Samples Initiative, and the HUPO Proteomics Standards Initiative (HUPO PSI, http://www.psidev.info).
Group photo of the participants in the ESF Quality Control in Proteomics workshop.
Fig. 1 Group photo of the participants in the ESF Quality Control in Proteomics workshop.

An overview is provided here of the various talks and discussions held at the workshop, and some overall conclusions are made as to the future prospects of quality control in proteomics.

The meeting was kicked off by an introductory talk by Prof. Lennart Martens from Ghent University, Belgium, that provided a high-level overview of several existing approaches at quality control (QC) in proteomics. Selected examples include the online analysis of the performance of mass spectrometers,8 a data analysis algorithm to assess the overall quality of large-scale protein identifications,9 the HUPO Test Samples initiative’s equimolar twenty-protein sample,10 and the National Cancer Institute (NCI) CPTAC QC metrics.11 The next talk, by Bruno Domon, ETH Zurich, Switzerland, focused on the quality control aspects of the fast emerging application of targeted proteomics methods, and their relevance for clinical studies. The third speaker, Karl Mechtler from the IMP in Vienna, Austria, presented the simple yet powerful CARLA test to assess dead volumes in an LC system,12 and introduced the work of the ABRF Proteomics Standards Research Group (sPRG), which has created several standardized protein mixtures for assessing the performance of proteomics analysis pipelines. The next speaker, Will Dracup from Nonlinear Dynamics Group, Newcastle-upon-Tyne, UK then presented the first HUPO reproducibility study for the analysis of 2D-gel images. This study proved that good reproducibility between the various groups could be achieved, and Dracup therefore proposed that the most fundamental type of quality control is the de facto reproducibility of findings. Matthieu Visser from Philips Research, Eindhoven, the Netherlands, then pointed out three levels of quality in a proteomics study in his talk: (i) The quality of the question, i.e. can the hypothesis be answered by the technology used; (ii) the quality of the experiment, i.e. have technical and biological replicates been included, and bias excluded e.g. through sample order randomization; and (iii) the quality of the actual data. Regarding this latter aspect, he presented a substantial array of simple metrics that were computed from a relatively small (5000 spectra) pilot experiment with experimental mass spectrometry data from an external supplier. The following speaker, Katleen Verleysen from Pronota, Gent, Belgium, introduced an overview of the QC requirements in a biotech start-up company focused on biomarker discovery. She also pointed out that high sample quality is paramount for any downstream workflow, emphasising that a priori methods to assess sample fitness would be invaluable tools for QC across the life sciences. Next to present was Hans Vissers from Waters Corporation, Manchester, UK, who provided a broad scope of QC as it applies to the manufacturing chain of an instrument, all the way to the acceptance criteria that need to be fulfilled at the customer site, where Waters is currently pioneering a total system test that simultaneously verifies HPLC and MS performance on a complex sample. In the final talk of the first day, Sarah Robinson, of Thermo Fisher Scientific, Hemel Hempstead, UK, presented results from an inter-lab study that showed that reproducibility can be achieved in SRM assays on clinical samples, thus concurring with the conclusions from the HUPO reproducibility study for 2D gel image analysis presented by Will Dracup that stringent adherence to a well-developed and carefully controlled protocol allows inter-lab reproducibility. The second day of the meeting started off with a talk from Kathryn Lilley from Cambridge University, Cambridge, UK, who highlighted the importance of economical, small networking and method-sharing meetings to boost the communication of technical and methodological advances in the field, especially since these innovations are often not reported separately in journals. She also presented results from the ABRF Proteomics Research Group (PRG), which initiated whole-workflow studies of mass spectrometry based proteomics analyses. The next speaker, René Zahedi from the Leibniz Institute for Analytical Sciences - ISAS, Dortmund, Germany, described QC approaches to monitor systems (or workflows) for day-to-day, between-instrument, and between-user reproducibility, thus linking back to the concept of reproducibility as the ultimate QC test as proposed by Will Dracup earlier. Ola Forsstrom-Olsson, from Ludesi, Malmö, Sweden, next presented a highly interesting, freely available application called Gel IQ (http://www.ludesi.com/free-tools/geliq) developed by his company to allow any user to perform a semi-automated quality control on their 2D gel image analysis. The following speaker, Mats Borén of Denator, Gothenburg, Sweden, focused on preserving sample quality in proteomics workflows, as the degradation of samples through remaining enzyme activity after lysis is a well-known phenomenon (particularly in plasma). The stabilization of samples of course ties in directly with the previously highlighted need for high quality samples, and touches upon the related goal of coming up with a priori sample quality metrics. In the next talk, Lukas Käll from Stockholm University, Sweden, specifically talked about quality control applied to automated peptide identifications. Rather than the currently popular practice of calculating false discovery rates (FDRs), q values were proposed as a stable alternative when sets of identifications are searched, and posterior error probabilities (PEP, also called a local-FDR) were discussed as useful metrics when assessing the confidence of a selected identified peptide. Following this, Sara ten Have from Dundee University, UK presented the advantages of a comprehensive data management system (such as the group's PepTracker system) in terms of both research as well as the establishment of acceptable operational parameters. The final speaker of the second day, Marc Vaudel from the Leibniz Institute for Analytical Sciences - ISAS, Dortmund, Germany, focused on a very technical, yet equally important topic: the effects of errors during signal processing on the quantification of proteins using mass spectrometry. Since the errors can amount to 10 or 20%, QC of these steps, typically regarded as a black box by most users, is clearly essential as well. The third day started off with a talk by Michael Smith from the Royal Society of Chemistry (RSC), Cambridge, UK, who presented his views on QC from a journal editor perspective. The RSC journals have a long-standing history in this topic, as the field of chemistry benefits from long-standing and stringent QC metrics throughout many of its sub-disciplines. This was illustrated for the field of small molecule analysis, showing that mandatory deposition of source data in a centralized public repository, along with (semi-)automated QC steps on the data can be extremely effective in bolstering the overall quality of the work, and subsequent data reuse. Dr Smith also pointed out that one of the requirements for such an approach to work is the support from publishers, a process that is now ongoing in proteomics as well. This was evident in the talk of the next speaker, Roz Banks from Leeds University, UK, who is also an Editor of the Wiley journal Proteomics, and Associate Editor for its sister journal Proteomics Clinical Applications (PCA). The newly published publication guidelines for PCA were discussed, highlighting the stringent criteria that need to be met before manuscripts can be considered for review. The next speaker, Juan Pablo Albar of CNB-CSIC and the ProteoRed Consortium in Madrid, Spain, described the comparative inter-laboratory studies performed by the ProteoRed Consortium. These results show that SOPs and QC are crucial for reproducibility, and that inter-lab reproducibility can be achieved if these criteria are met. The final speaker, Chris Taylor from EMBL-EBI in Cambridge, UK, discussed the relationship between quality control and minimal reporting requirements formulated in the field of proteomics by the HUPO PSI.

Throughout the workshop, it became clear that there was broad agreement from the participants that quality control approaches need to gain more prominence in the field of proteomics. While some approaches are already published and tested, they are rarely applied outside of the laboratories or groups that originally developed them. Furthermore, although efforts have been undertaken to establish end-to-end quality control for an analysis pipeline through standardized samples (e.g., by ABRF and HUPO), there is no consensus in the field to routinely use such standards to verify the performance of a workflow. As a result, much work remains to be done in the field before a consistent high level of confidence in the accuracy and reproducibility of the results can be achieved.

Another clear conclusion from the discussions was that QC approaches and metrics were needed for both in vitro and in silico work, as substantial errors can creep in at either stage. The participants furthermore agreed that the widespread adoption of QC measures could only be achieved if three conditions are met: (i) metrics and approaches must be focused in scope and standardized across the community; (ii) journals and funders need to require that essential QC steps are carried out for work that is to be published or funded; and (iii) adoption should be eased through well-organized educational sessions and resources, and the automation of QC steps in dedicated, free software applications where possible.

Finally, another interesting point of discussion revolved around the reproducibility of experiments. While the ability to reproduce results was considered a fundamental QC metric by all, many questioned the feasibility of mandating that all experiments needed to be reproduced prior to publication. Indeed, samples may be rare and precious, and shipping samples may introduce all sorts of artefactual changes. Furthermore, there is currently no clear incentive for labs to actually take on the work involved in reproducing findings, especially since the amount of work involved, or the cost of running the experiments can easily be prohibitive. As a possible alternative, a virtual reproducibility analysis was proposed, which would compare data and results from an experiment to the corresponding information in public repositories such as the Proteomics Identifications Database (PRIDE) at EMBL-EBI.13 While such an analysis might lack the sensitivity of a direct reproduction of an experiment, it will allow the detection of sample or workflow specific problems that may be overlooked with metrics derived from the dataset itself.

With proteomics poised to take its rightful place as one of the primary analytical techniques in the life sciences, it is clear that the time has now come for the field to shore up its existing methods and approaches with robust and universally adopted quality control metrics. Importantly, the ESF workshop on Quality Control in Proteomics has proven that sufficient momentum for this maturation to take place already exists in the field, and that the first few steps have indeed been taken towards this highly desirable goal.

Acknowledgements

The Quality Control in Proteomics Workshop was funded by the European Science Foundation (ESF) Frontiers in Functional Genomics (FFG) Programme.

The QC workshop participants were: Juan Pablo Albar (Centro Nacional de Biotecnologia), Roz Banks (University of Leeds), Pierre-Alain Binz (SIB/GeneBio Geneva), Miriam Boeckmann (Medizinisches Proteom-Center Bochum), Mats Borén (Denator AB Gothenburg), Christian Bunse (Medizinisches Proteom-Center Bochum), Janusz Debski (Institute of Biochemistry and Biophysics PAS Warszawa), Bruno Domon (ETH Zurich), Will Dracup (Nonlinear Dynamics Limited Newcastle upon Tyne), Charlotte Emlind Vahul (Denator AB Gothenburg), Ola Forsstrom-Olsson (Ludesi Malmö), Joseph Foster (EBI Cambridge), Bart Ghesquiere (Ghent University), Lukas Käll (Stockholm University), Patrick Lavery (Nonlinear Dynamics Limited Newcastle upon Tyne), Fredrik Levander (Lund University), Kathryn Lilley (University of Cambridge), Lennart Martens (Ghent University), Salvador Martínez-Bartolomé (ProteoRed - National Center for Biotechnology - CSIC Cantoblanco Madrid), Karl Mechtler (IMP Vienna), Shabaz Mohammed (Utrecht University), Martin O'Gorman (Nonlinear Dynamics Limited Newcastle-upon-Tyne), Alberto Paradela (Centro Nacional de Biotecnologia - CSIC Madrid), Reinout Raijmakers (Utrecht University), Sarah Robinson (Thermo Fisher Scientific Hemel Hempstead), Michael Smith (Royal Society of Chemistry Cambridge), Christian Stephan (Medizinisches Proteom-Center Bochum), Chris Taylor (EMBL-EBI Hinxton), Sara ten Have (Wellcome Trust Centre for Gene Regulation and Expression Dundee), Lies Vanneste (Pronota NV Zwijnaarde), Marc Vaudel (Leibniz Institute for Analytical Sciences-ISAS Dortmund), Ilina Vavrek (IEMAM-BAS Sofia), Katleen Verleysen (Pronota NV Zwijnaarde), Matthieu Visser (Philips Research Eindhoven), Hans Vissers (Waters Corporation Manchester), Juan Antonio Vizcaino (EMBL-EBI Cambridge), René Zahedi (Leibniz Institute for Analytical Sciences-ISAS Dortmund).

References

  1. B. Domon and R. Aebersold, Mass spectrometry and protein analysis, Science, 2006, 312, 212–217 CrossRef CAS.
  2. T. Hubbard, B. Aken, S. Ayling, B. Ballester, K. Beal, E. Bragin, S. Brent, Y. Chen, P. Clapham, L. Clarke, G. Coates, S. Fairley, S. Fitzgerald, J. Fernandez-Banet, L. Gordon, S. Graf, S. Haider, M. Hammond, R. Holland, K. Howe, A. Jenkinson, N. Johnson, A. Kahari, D. Keefe, S. Keenan, R. Kinsella, F. Kokocinski, E. Kulesha, D. Lawson, I. Longden, K. Megy, P. Meidl, B. Overduin, A. Parker, B. Pritchard, D. Rios, M. Schuster, G. Slater, D. Smedley, W. Spooner, G. Spudich, S. Trevanion, A. Vilella, J. Vogel, S. White, S. Wilder, A. Zadissa, E. Birney, F. Cunningham, V. Curwen, R. Durbin, X. Fernandez-Suarez, J. Herrero, A. Kasprzyk, G. Proctor, J. Smith, S. Searle and P. Flicek, Ensembl 2009, Nucleic Acids Res., 2009, 37, D690–D697 CrossRef CAS.
  3. The UniProt Consortium, The Universal Protein Resource (UniProt) 2009, Nucleic Acids Res., 2009, 37, D169–D174 CrossRef.
  4. K. Gevaert, P. Van Damme, B. Ghesquière, F. Impens, L. Martens, K. Helsens and J. Vandekerckhove, A la carte proteomics with an emphasis on gel-free techniques, Proteomics, 2007, 7, 2698–2718 CrossRef CAS.
  5. R. Aebersold and M. Mann, Mass spectrometry-based proteomics, Nature, 2003, 422, 198–207 CrossRef CAS.
  6. L. M. F. de Godoy, J. V. Olsen, J. Cox, M. L. Nielsen, N. C. Hubner, F. Fröhlich, T. C. Walther and M. Mann, Comprehensive mass-spectrometry-based proteome quantification of haploid versus diploid yeast, Nature, 2008, 455, 1251–1254 CrossRef CAS.
  7. L. Martens and H. Hermjakob, Proteomics data validation: why all must provide data, Mol. BioSyst., 2007, 3, 518–522 RSC.
  8. H. Xu and M. A. Freitas, Automated diagnosis of LC-MS/MS performance, Bioinformatics, 2009, 25, 1341–1343 CrossRef CAS.
  9. D. Fenyo, B. S. Phinney and R. C. Beavis, Determining the overall merit of protein identification data sets: rho-diagrams and rho-scores, J. Proteome Res., 2007, 6, 1997–2004 CrossRef CAS.
  10. A. W. Bell, E. W. Deutsch, C. E. Au, R. E. Kearney, R. Beavis, S. Sechi, T. Nilsson and J. J. M. Bergeron, A HUPO test sample study reveals common problems in mass spectrometry-based proteomics, Nat. Methods, 2009, 6, 423–430 CrossRef CAS.
  11. P. A. Rudnick, K. R. Clauser, L. E. Kilpatrick, D. V. Tchekhovskoi, P. Neta, N. Blonder, D. D. Billheimer, R. K. Blackman, D. M. Bunk, H. L. Cardasis, A. L. Ham, J. D. Jaffe, C. R. Kinsinger, M. Mesri, T. A. Neubert, B. Schilling, D. L. Tabb, T. J. Tegeler, L. Vega-Montoto, A. M. Variyath, M. Wang, P. Wang, J. R. Whiteaker, L. J. Zimmerman, S. A. Carr, S. J. Fisher, B. W. Gibson, A. G. Paulovich, F. E. Regnier, H. Rodriguez, C. Spiegelman, P. Tempst, D. C. Liebler and S. E. Stein, Performance metrics for liquid chromatography-tandem mass spectrometry systems in proteomics analyses, Mol. Cell. Proteomics, 2010, 9, 225–241 CrossRef CAS.
  12. G. Mitulović, C. Stingl, M. Smoluch, R. Swart, J. Chervet, I. Steinmacher, C. Gerner and K. Mechtler, Automated, on-line two-dimensional nano liquid chromatography tandem mass spectrometry for rapid analysis of complex protein digests, Proteomics, 2004, 4, 2545–2557 CrossRef CAS.
  13. L. Martens, H. Hermjakob, P. Jones, M. Adamski, C. Taylor, D. States, K. Gevaert, J. Vandekerckhove and R. Apweiler, PRIDE: the proteomics identifications database, Proteomics, 2005, 5, 3537–3545 CrossRef CAS.

This journal is © The Royal Society of Chemistry 2010
Click here to see how this site uses Cookies. View our privacy policy here.