Terahertz time-domain spectroscopy combined with support vector machines and partial least squares-discriminant analysis applied for the diagnosis of cervical carcinoma
Abstract
Coupled with terahertz time-domain spectroscopy (THz-TDS) technology, the feasibility for the diagnosis of cervical carcinoma using support vector machines (SVM) and partial least squares-discriminant analysis (PLS-DA) had been studied. The terahertz spectra of 52 specimens of cervix were collected. The performance of the preprocessing methods of multiplicative scatter correction (MSC), Savitzky–Golay (SG) smoothing and first derivative, principal component orthogonal signal correction (PC-OSC) and emphatic orthogonal signal correction (EOSC) were investigated for PLS-DA and SVM models. The effects of the different pretreatment methods with respect to classification accuracy were compared. The PLS-DA and SVM models were validated using the bootstrapped Latin-partition method. The SVM and PLS-DA models optimized with the combination of SG first derivative and PC-OSC preprocessing had the best predictive results with classification rates of 94.0% ± 0.4% and 94.0% ± 0.5%, respectively. The proposed procedure proved that terahertz spectroscopy combined with classifiers provides a technology that has potential as a new diagnosis method for cancer tissue.