Jump to main content
Jump to site search

Issue 86, 2016, Issue in Progress
Previous Article Next Article

On feature selection for supervised learning problems involving high-dimensional analytical information

Author affiliations

Abstract

Several computational methods were applied to feature selection for supervised learning problems that can be encountered in the field of analytical chemistry. Namely, Genetic Algorithm (GA), Firefly Algorithm (FA), Particle Swarm Optimization (PSO), Least Absolute Shrinkage and Selection Operator (LASSO), Least Angle Regression Algorithm (LARS), interval Partial Least Squares (iPLS), sparse PLS (sPLS), and Uninformative Variable Elimination-PLS (UVE-PLS). Methods were compared in two case studies which cover both supervised learning cases; (i) regression: multivariate calibration of soil carbonate content using Fourier transform mid-infrared (FT-MIR) spectral information, and (ii) classification: diagnosis of prostate cancer patients using gene expression information. Beside quantitative performance measures: error and accuracy often used in feature selection studies, a qualitative measure, the selection index (SI), was introduced to evaluate the methods in terms of quality of selected features. Robustness was evaluated introducing artificially generated noise variables to both datasets. Results of the first case study have shown that in order of decreasing predictive ability and robustness: GA > FA ≈ PSO > LASSO > LARS (errors of 1.775, 4.504, 4.055 mg g−1, 10.085, and 10.510 mg g−1) are recommended for application in regression involving spectral information. In the second case study, the following trend: GA > PSO > FA ≈ LASSO > LARS (accuracies of 100, 95.12 and 90.24%) has been observed. Strong robustness has been observed in the regression case with no decrease in SI for GA, and SI decreasing from 28.85 to 10.26, and 36.11 to 21.05%, for FA and PSO, respectively. In the classification case, only LARS exhibited a considerable decrease in accuracy upon introduction of noise features. Major sources of errors were identified and mostly originated from the analytical methods themselves, which confirmed strong applicability of the evaluated feature selection methods.

Graphical abstract: On feature selection for supervised learning problems involving high-dimensional analytical information

Back to tab navigation

Publication details

The article was received on 11 Apr 2016, accepted on 26 Aug 2016 and first published on 26 Aug 2016


Article type: Paper
DOI: 10.1039/C6RA09336A
Citation: RSC Adv., 2016,6, 82801-82809
  •   Request permissions

    On feature selection for supervised learning problems involving high-dimensional analytical information

    P. Žuvela and J. Jay Liu, RSC Adv., 2016, 6, 82801
    DOI: 10.1039/C6RA09336A

Search articles by author

Spotlight

Advertisements