Data augmentation method based on the Gaussian kernel density for glioma diagnosis with Raman spectroscopy
Abstract
Glioma is an intracranial malignant brain tumor with high infiltration. It is difficult to identify the glioma boundary. Raman spectroscopy can potentially detect this boundary accurately in situ and in vivo during surgery. However, when building a classification model for an in vitro experiment, fresh normal tissue is difficult to obtain. The number of normal tissues is far less than that of glioma tissues, which leads to a classification bias toward the majority class. In this study, a data augmentation algorithm GKIM based on the Gaussian kernel density is proposed for the data augmentation of normal tissue spectra. A weight coefficient calculation formula is proposed based on the Gaussian density instead of a fixed coefficient to synthesize new spectra, which increases sample diversity and improves the robustness of modeling. Additionally, the fuzzy nearest neighbor distance replaces the general fixed neighbor number K to select the original spectra for synthesis. It automatically determines the nearest spectra and adaptively synthesizes new spectra according to the characteristics of the input spectra. It effectively overcomes the problem of the newly generated sample distribution being too concentrated in specific spaces for the common data augmentation method. In this study, 769 Raman spectra of glioma and 136 Raman spectra of normal brain tissue corresponding to 205 and 37 cases, respectively, were collected. The Raman spectra of the normal tissue were extended to 600. The accuracy, sensitivity, and specificity were 91.67%, 91.67%, and 91.67%. The proposed method achieved better predictive performance than traditional algorithms for class imbalance.
- This article is part of the themed collection: Analytical Methods HOT Articles 2023