Approach for feature selection with data modelling in LC-MS metabolomics
The data processing workflow for LC-MS based metabolomics study is suggested with signal drift correction, univariate analysis, supervised learning, feature selection and unsupervised modelling. Proposed approach requires only annotation-free peak table and produces the extremely reduced set of the most relevant features together with validation via Receiver Operating Characteristic analysis for selected predictors, cross-validation and unsupervised projection. The presented study was initially optimised by own experimental set and then was successfully tested by using 36 datasets from 21 publically available metabolomics projects. The suggested workflow can be used for classification purposes in high dimensional metabolomics studies and as a first step in exploratory analysis, data projection, biomarker selection, data integration and fusion.