Issue 86, 2016, Issue in Progress

On feature selection for supervised learning problems involving high-dimensional analytical information

Abstract

Several computational methods were applied to feature selection for supervised learning problems that can be encountered in the field of analytical chemistry. Namely, Genetic Algorithm (GA), Firefly Algorithm (FA), Particle Swarm Optimization (PSO), Least Absolute Shrinkage and Selection Operator (LASSO), Least Angle Regression Algorithm (LARS), interval Partial Least Squares (iPLS), sparse PLS (sPLS), and Uninformative Variable Elimination-PLS (UVE-PLS). Methods were compared in two case studies which cover both supervised learning cases; (i) regression: multivariate calibration of soil carbonate content using Fourier transform mid-infrared (FT-MIR) spectral information, and (ii) classification: diagnosis of prostate cancer patients using gene expression information. Beside quantitative performance measures: error and accuracy often used in feature selection studies, a qualitative measure, the selection index (SI), was introduced to evaluate the methods in terms of quality of selected features. Robustness was evaluated introducing artificially generated noise variables to both datasets. Results of the first case study have shown that in order of decreasing predictive ability and robustness: GA > FA ≈ PSO > LASSO > LARS (errors of 1.775, 4.504, 4.055 mg g−1, 10.085, and 10.510 mg g−1) are recommended for application in regression involving spectral information. In the second case study, the following trend: GA > PSO > FA ≈ LASSO > LARS (accuracies of 100, 95.12 and 90.24%) has been observed. Strong robustness has been observed in the regression case with no decrease in SI for GA, and SI decreasing from 28.85 to 10.26, and 36.11 to 21.05%, for FA and PSO, respectively. In the classification case, only LARS exhibited a considerable decrease in accuracy upon introduction of noise features. Major sources of errors were identified and mostly originated from the analytical methods themselves, which confirmed strong applicability of the evaluated feature selection methods.

Graphical abstract: On feature selection for supervised learning problems involving high-dimensional analytical information

Article information

Article type
Paper
Submitted
11 Apr 2016
Accepted
26 Aug 2016
First published
26 Aug 2016

RSC Adv., 2016,6, 82801-82809

On feature selection for supervised learning problems involving high-dimensional analytical information

P. Žuvela and J. Jay Liu, RSC Adv., 2016, 6, 82801 DOI: 10.1039/C6RA09336A

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements