Multivariate Statistics in Lipidomics
In lipidomics the aim is to measure the concentration of every lipid above the limit of detection in a biofluid or tissue extract. By its very nature this produces large multivariate datasets where standard univariate statistical tools are inappropriate because of the problems of multiple testing and multiple co-variants. To address this there is increasing interest in the use of multivariate statistics and machine learning approaches to process the datasets obtained. In this chapter we will examine why multivariate statistical tools are often more appropriate than their univariate counterparts, and introduce some common unsupervised and supervised approaches used in lipidomics, including principal components analysis, hierarchical cluster analysis, partial least squares discriminate analysis and machine learning approaches such as random forests. The application of multivariate statistics will then be demonstrated in applications that produce one-dimensional (direct infusion mass spectrometry), two-dimensional (liquid chromatography mass spectrometry) and three-dimensional (mass spectrometry imaging, liquid chromatography ion mobility mass spectrometry) datasets as part of their lipidomic workflows.