Assessing the discrimination potential of linear and non-linear supervised chemometric methods on a filamentous fungi FTIR spectral database
This study proposes a comparative investigation of different linear and non-linear chemometric methods applied to the same database of infrared spectra for filamentous fungi discrimination and identification. The database was comprised of 277 strains (14 genera, 36 species), identified and validated by DNA sequencing, and analyzed by high-throughput Fourier Transform Infrared (FTIR) spectroscopy in the 4000–400 cm−1 wavenumber range. A cascade of 20 supervised models based on taxonomic ranks was constructed to predict classes until the species taxonomic rank. The cascade modeling was used to test 11 algorithms (5 linear and 6 non-linear) of supervised classification methods. To assess these algorithms, indicators of classification rates and McNemar's tests were defined and applied in the same way to each of them. For non-linear algorithms, the KNN (K Nearest Neighbors) method proved to be the best classifier (78%). Linear algorithms, PLS-DA (Partial Least Square-Discriminant Analysis) and the SVM (Support Vector Machine) showed better performances than non-linear methods with the best classification potential (∼93%). The SVM and PLS-DA were comparable and a possible complementarity between these two algorithms was highlighted.