Jump to main content
Jump to site search

Issue 39, 2016
Previous Article Next Article

Representative subset selection and outlier detection via isolation forest

Author affiliations

Abstract

In order to a build robust and predictive model, all outliers should be eliminated and representative samples should be selected. In this study, Isolation forest Outlier detection and Subset selection (IOS) has been proposed, which can detect outliers and select representative subsets simultaneously. IOS is different from the classical subset selection method, which is cluster-based and has a uniform design. A comparative study among the IOS, Kennard–Stone (KS), sample set partitioning based on joint xy distances (SPXY) and random sampling (RS) methods was conducted. The performances of these algorithms were benchmarked with four datasets, including two normal NIR datasets, which are free of outliers: soil and diesel fuel, and two datasets with outliers: milk NIR dataset and solubility QSAR dataset (LogS). Results show that IOS can detect outliers and select representative subsets of samples simultaneously, which reduces prediction errors significantly compared with the KS, SPXY and RS methods. IOS can eliminate outliers and select representative samples without y values. Hence, the proposed method may be an advantageous alternative to the other three strategies. IOS is implemented in MATLAB language and is available at https://github.com/zmzhang/IOS.

Graphical abstract: Representative subset selection and outlier detection via isolation forest

Back to tab navigation

Supplementary files

Publication details

The article was received on 03 Jun 2016, accepted on 05 Sep 2016 and first published on 06 Sep 2016


Article type: Paper
DOI: 10.1039/C6AY01574C
Citation: Anal. Methods, 2016,8, 7225-7231
  •   Request permissions

    Representative subset selection and outlier detection via isolation forest

    W. Chen, Y. Yun, M. Wen, H. Lu, Z. Zhang and Y. Liang, Anal. Methods, 2016, 8, 7225
    DOI: 10.1039/C6AY01574C

Search articles by author

Spotlight

Advertisements