Issue 39, 2016

Representative subset selection and outlier detection via isolation forest

Abstract

In order to a build robust and predictive model, all outliers should be eliminated and representative samples should be selected. In this study, Isolation forest Outlier detection and Subset selection (IOS) has been proposed, which can detect outliers and select representative subsets simultaneously. IOS is different from the classical subset selection method, which is cluster-based and has a uniform design. A comparative study among the IOS, Kennard–Stone (KS), sample set partitioning based on joint xy distances (SPXY) and random sampling (RS) methods was conducted. The performances of these algorithms were benchmarked with four datasets, including two normal NIR datasets, which are free of outliers: soil and diesel fuel, and two datasets with outliers: milk NIR dataset and solubility QSAR dataset (LogS). Results show that IOS can detect outliers and select representative subsets of samples simultaneously, which reduces prediction errors significantly compared with the KS, SPXY and RS methods. IOS can eliminate outliers and select representative samples without y values. Hence, the proposed method may be an advantageous alternative to the other three strategies. IOS is implemented in MATLAB language and is available at https://github.com/zmzhang/IOS.

Graphical abstract: Representative subset selection and outlier detection via isolation forest

Supplementary files

Article information

Article type
Paper
Submitted
03 Jun 2016
Accepted
05 Sep 2016
First published
06 Sep 2016

Anal. Methods, 2016,8, 7225-7231

Representative subset selection and outlier detection via isolation forest

W. Chen, Y. Yun, M. Wen, H. Lu, Z. Zhang and Y. Liang, Anal. Methods, 2016, 8, 7225 DOI: 10.1039/C6AY01574C

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements