Issue 19, 2016

The model adaptive space shrinkage (MASS) approach: a new method for simultaneous variable selection and outlier detection based on model population analysis

Abstract

Variable selection and outlier detection are important processes in chemical modeling. Usually, they affect each other. Their performing orders also strongly affect the modeling results. Currently, many studies perform these processes separately and in different orders. In this study, we examined the interaction between outliers and variables and compared the modeling procedures performed with different orders of variable selection and outlier detection. Because the order of outlier detection and variable selection can affect the interpretation of the model, it is difficult to decide which order is preferable when the predictabilities (prediction error) of the different orders are relatively close. To address this problem, a simultaneous variable selection and outlier detection approach called Model Adaptive Space Shrinkage (MASS) was developed. This proposed approach is based on model population analysis (MPA). Through weighted binary matrix sampling (WBMS) from model space, a large number of partial least square (PLS) regression models were built, and the elite parts of the models were selected to statistically reassign the weight of each variable and sample. Then, the whole process was repeated until the weights of the variables and samples converged. Finally, MASS adaptively found a high performance model which consisted of the optimized variable subset and sample subset. The combination of these two subsets could be considered as the cleaned dataset used for chemical modeling. In the proposed approach, the problem of the order of variable selection and outlier detection is avoided. One near infrared spectroscopy (NIR) dataset and one quantitative structure–activity relationship (QSAR) dataset were used to test this approach. The result demonstrated that MASS is a useful method for data cleaning before building a predictive model.

Graphical abstract: The model adaptive space shrinkage (MASS) approach: a new method for simultaneous variable selection and outlier detection based on model population analysis

Supplementary files

Article information

Article type
Paper
Submitted
01 Apr 2016
Accepted
04 Jul 2016
First published
05 Jul 2016

Analyst, 2016,141, 5586-5597

Author version available

The model adaptive space shrinkage (MASS) approach: a new method for simultaneous variable selection and outlier detection based on model population analysis

M. Wen, B. Deng, D. Cao, Y. Yun, R. Yang, H. Lu and Y. Liang, Analyst, 2016, 141, 5586 DOI: 10.1039/C6AN00764C

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements