Jump to main content
Jump to site search

All chapters
Previous chapter Next chapter


Introduction to the Applications of Chemometric Techniques in ‘Omics’ Research: Common Pitfalls, Misconceptions and ‘Rights and Wrongs’

In this introductory chapter, the two most commonly employed techniques available for the multivariate (MV) analysis of multianalyte, high-dimensional metabolomics or genomic datasets, specifically principal component analysis (PCA) and partial least squares-discriminatory analysis (PLS-DA), are reviewed in detail (the former represents an unsupervised exploratory data analysis technique, whilst the latter is a supervised pattern recognition one). In particular, the pivotal requirements of each of these methods for the satisfaction of critical assumptions required (normality and homoscedasticity of potential MV predictor X variables, the linearity of relationships between each of these, etc.), the interpretabilities of results derived therefrom, and the assurance and actuation of adequate sample sizes for their performance, for example, are reviewed. For the PLS-DA approach, the critical requirement for a minimum sample size is discussed in much detail, together with considerations of the danger of ‘overfitting’, with a typical example. The highly valued importance of validation, cross-validation and random permutation testing (and also the range of methods available for the satisfactory performance of these processes) is also critically delineated, in addition to means available for model quality evaluation.

Print publication date: 18 Nov 2014
Copyright year: 2015
Print ISBN: 978-1-84973-163-8
PDF eISBN: 978-1-84973-516-2
ePub eISBN: 978-1-78262-360-1
From the book series:
Issues in Toxicology