Application of a local outlier detection algorithm based on high-dimensional subspaces in near-infrared spectroscopy
Abstract
Due to the high dimensionality and non-linearity of the near infrared (NIR) spectral data, measuring outliers becomes difficult. During the near-infrared spectrum collection process, outliers usually appear due to factors such as uneven distribution of samples, environmental changes, measurement instrument deviations, improper operation, etc. These outliers will bias the direction predicted by the model, making the model prediction results unreliable. Therefore, it is necessary to eliminate the outliers in the process of near-infrared modeling to improve the accuracy of the model. This paper proposes an outlier detection algorithm based on high-dimensional subspaces. This algorithm first introduces a new method for determining local subspaces, which combines local sparsity with adaptive neighborhood selection to determine the local subspace. At the same time, we use the concept of jump degree to adaptively determine the anomaly threshold, thereby achieving the recognition of outliers. In order to investigate the effectiveness of the algorithm, a comparison was made with commonly used PCA-Mahalanobis distance, spectral residual (SR), and leverage method in terms of projection performance, to test the accuracy of the algorithm in distinguishing outliers. In addition, to verify the accuracy in processing high-dimensional data, we compared LoOP and SOD with our method. The experimental results showed that the subspace-based outlier detection method effectively improved the performance of outlier identification and calibration for NIR analysis.