Partial least squares-discriminant analysis (PLS-DA) for classification of high-dimensional (HD) data: a review of contemporary practice strategies and knowledge gaps

Loong Chuen Lee; Choong-Yeun Liong; Abdul Aziz Jemain

doi:10.1039/C8AN00599K

Partial least squares-discriminant analysis (PLS-DA) for classification of high-dimensional (HD) data: a review of contemporary practice strategies and knowledge gaps†

Loong Chuen Lee,

*^ab Choong-Yeun Liong

*^b and Abdul Aziz Jemain

Author affiliations

* Corresponding authors

^a Forensic Science Programme, FSK, Universiti Kebangsaan Malaysia, Jalan Raja Muda Abdul Aziz, 50300 Kuala Lumpur, Malaysia
E-mail: lc_lee@ukm.edu.my

^b Statistics Programme, FST, Universiti Kebangsaan Malaysia, 43600 Bangi, Selangor, Malaysia
E-mail: lg@ukm.edu.my

Abstract

Partial least squares-discriminant analysis (PLS-DA) is a versatile algorithm that can be used for predictive and descriptive modelling as well as for discriminative variable selection. However, versatility is both a blessing and a curse and the user needs to optimize a wealth of parameters before reaching reliable and valid outcomes. Over the past two decades, PLS-DA has demonstrated great success in modelling high-dimensional datasets for diverse purposes, e.g. product authentication in food analysis, diseases classification in medical diagnosis, and evidence analysis in forensic science. Despite that, in practice, many users have yet to grasp the essence of constructing a valid and reliable PLS-DA model. As the technology progresses, across every discipline, datasets are evolving into a more complex form, i.e. multi-class, imbalanced and colossal. Indeed, the community is welcoming a new era called big data. In this context, the aim of the article is two-fold: (a) to review, outline and describe the contemporary PLS-DA modelling practice strategies, and (b) to critically discuss the respective knowledge gaps that have emerged in response to the present big data era. This work could complement other available reviews or tutorials on PLS-DA, to provide a timely and user-friendly guide to researchers, especially those working in applied research.

This article is part of the themed collections: 150th Anniversary Collection: Highly Cited Articles and Recent Review Articles

Supplementary files

Article information

DOI: https://doi.org/10.1039/C8AN00599K
Article type: Critical Review
Submitted: 30 Mar 2018
Accepted: 31 May 2018
First published: 01 Jun 2018

Download Citation

Analyst, 2018,143, 3526-3539

Permissions

Request permissions

Partial least squares-discriminant analysis (PLS-DA) for classification of high-dimensional (HD) data: a review of contemporary practice strategies and knowledge gaps

L. C. Lee, C. Liong and A. A. Jemain, Analyst, 2018, 143, 3526 DOI: 10.1039/C8AN00599K

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Analyst

Partial least squares-discriminant analysis (PLS-DA) for classification of high-dimensional (HD) data: a review of contemporary practice strategies and knowledge gaps†

Abstract

Supplementary files

Article information

Download Citation

Permissions

Partial least squares-discriminant analysis (PLS-DA) for classification of high-dimensional (HD) data: a review of contemporary practice strategies and knowledge gaps

Social activity

Search articles by author

Spotlight

Advertisements