Jump to main content
Jump to site search

Issue 19, 2016
Previous Article Next Article

The model adaptive space shrinkage (MASS) approach: a new method for simultaneous variable selection and outlier detection based on model population analysis

Author affiliations

Abstract

Variable selection and outlier detection are important processes in chemical modeling. Usually, they affect each other. Their performing orders also strongly affect the modeling results. Currently, many studies perform these processes separately and in different orders. In this study, we examined the interaction between outliers and variables and compared the modeling procedures performed with different orders of variable selection and outlier detection. Because the order of outlier detection and variable selection can affect the interpretation of the model, it is difficult to decide which order is preferable when the predictabilities (prediction error) of the different orders are relatively close. To address this problem, a simultaneous variable selection and outlier detection approach called Model Adaptive Space Shrinkage (MASS) was developed. This proposed approach is based on model population analysis (MPA). Through weighted binary matrix sampling (WBMS) from model space, a large number of partial least square (PLS) regression models were built, and the elite parts of the models were selected to statistically reassign the weight of each variable and sample. Then, the whole process was repeated until the weights of the variables and samples converged. Finally, MASS adaptively found a high performance model which consisted of the optimized variable subset and sample subset. The combination of these two subsets could be considered as the cleaned dataset used for chemical modeling. In the proposed approach, the problem of the order of variable selection and outlier detection is avoided. One near infrared spectroscopy (NIR) dataset and one quantitative structure–activity relationship (QSAR) dataset were used to test this approach. The result demonstrated that MASS is a useful method for data cleaning before building a predictive model.

Graphical abstract: The model adaptive space shrinkage (MASS) approach: a new method for simultaneous variable selection and outlier detection based on model population analysis

Back to tab navigation

Supplementary files

Publication details

The article was received on 01 Apr 2016, accepted on 04 Jul 2016 and first published on 05 Jul 2016


Article type: Paper
DOI: 10.1039/C6AN00764C
Author version available: Download Author version (PDF)
Citation: Analyst, 2016,141, 5586-5597
  •   Request permissions

    The model adaptive space shrinkage (MASS) approach: a new method for simultaneous variable selection and outlier detection based on model population analysis

    M. Wen, B. Deng, D. Cao, Y. Yun, R. Yang, H. Lu and Y. Liang, Analyst, 2016, 141, 5586
    DOI: 10.1039/C6AN00764C

Search articles by author

Spotlight

Advertisements