Pure ion chromatogram extraction via optimal k-means clustering
Abstract
Untargeted analysis of complex samples with liquid chromatography coupled with mass spectrometry (LC-MS) has shown a great prospect. However, it is still difficult to extract useful information from complicated LC-MS data. Recently, pure ion chromatograms (PIC) were introduced. They are effective for reducing ions not related to meaningful compounds. In this study, a novel method to extract PIC based on optimal k-means clustering (KPIC) is proposed. KPIC uses the clustering tendency of centroids of pure ions to extract PIC from raw LC-MS datasets adaptively. KPIC was tested with 3 datasets: simulated, MM48 and Arabidopsis thaliana datasets. Compared with PITracer and XCMS methods, the results show that KPIC has better accuracy of feature extraction. It is able to provide higher quality chromatographic peaks, particularly for low concentration compounds. KPIC reduces the number of split signals, due to avoiding estimation of ion mass difference tolerances subjectively. KPIC is implemented in R programming language, which is available as an open source package at https://github.com/hcji/KPIC.