Extraction, interpretation and validation of information for comparing samples in metabolic LC/MS data sets

Pär Jonsson; Stephen J. Bruce; Thomas Moritz; Johan Trygg; Michael Sjöström; Robert Plumb; Jennifer Granger; Elaine Maibaum; Jeremy K. Nicholson; Elaine Holmes; Henrik Antti

doi:10.1039/B501890K

Extraction, interpretation and validation of information for comparing samples in metabolic LC/MS data sets

Pär Jonsson,^a Stephen J. Bruce,^b Thomas Moritz,^c Johan Trygg,^a Michael Sjöström,^a Robert Plumb,^d Jennifer Granger,^d Elaine Maibaum,^b Jeremy K. Nicholson,^b Elaine Holmes^b and Henrik Antti*^a

Author affiliations

* Corresponding authors

^a Research Group for Chemometrics, Department of Chemistry, Umeå University, SE-90187 Umeå, Sweden
E-mail: henrik.antti@chem.umu.se
Tel: +46 90 7865358

^b Biological Chemistry Section, Imperial College London, Faculty of Medicine, Biomedical Sciences Division, South Kensington, London, UK

^c Umeå Plant Science Center, Department of Forest Genetics and Plant Physiology, Swedish University of Agricultural Sciences, SE-901 87 Umeå, Sweden

^d Life Sciences Research and Development, Waters Corporation, 34 Maple Street, Milford, USA

Abstract

LC/MS is an analytical technique that, due to its high sensitivity, has become increasingly popular for the generation of metabolic signatures in biological samples and for the building of metabolic data bases. However, to be able to create robust and interpretable (transparent) multivariate models for the comparison of many samples, the data must fulfil certain specific criteria: (i) that each sample is characterized by the same number of variables, (ii) that each of these variables is represented across all observations, and (iii) that a variable in one sample has the same biological meaning or represents the same metabolite in all other samples. In addition, the obtained models must have the ability to make predictions of, e.g. related and independent samples characterized accordingly to the model samples. This method involves the construction of a representative data set, including automatic peak detection, alignment, setting of retention time windows, summing in the chromatographic dimension and data compression by means of alternating regression, where the relevant metabolic variation is retained for further modelling using multivariate analysis. This approach has the advantage of allowing the comparison of large numbers of samples based on their LC/MS metabolic profiles, but also of creating a means for the interpretation of the investigated biological system. This includes finding relevant systematic patterns among samples, identifying influential variables, verifying the findings in the raw data, and finally using the models for predictions. The presented strategy was here applied to a population study using urine samples from two cohorts, Shanxi (People’s Republic of China) and Honolulu (USA). The results showed that the evaluation of the extracted information data using partial least square discriminant analysis (PLS-DA) provided a robust, predictive and transparent model for the metabolic differences between the two populations. The presented findings suggest that this is a general approach for data handling, analysis, and evaluation of large metabolic LC/MS data sets.

Analyst

Extraction, interpretation and validation of information for comparing samples in metabolic LC/MS data sets

Abstract

Article information

Download Citation

Permissions

Extraction, interpretation and validation of information for comparing samples in metabolic LC/MS data sets

Social activity

Search articles by author

Spotlight

Advertisements