Stochastic generalization models learn to comprehensively detect volatile organic compounds associated with foodborne pathogens via Raman spectroscopy†
Abstract
Ensuring food safety requires continuous innovation, especially in the detection of foodborne pathogens and chemical contaminants. In this study, we present a system that combines Raman spectroscopy with machine learning (ML) algorithms for the precise detection and analysis of VOCs linked to foodborne pathogens in complex liquid mixtures. A remote fiber-optic Raman probe was developed to collect spectral data from 42 distinct VOC mixtures, representing contamination scenarios with dilution levels ranging from undiluted to highly diluted states. A dataset comprising 1445 Raman spectra was analyzed using classification and regression ML models, including multi-layer perceptron (MLP), random forest, and extreme gradient boosting decision trees (XGBDT). The optimized ML models achieved over 90% classification accuracy for pure VOCs and demonstrated robust performance in identifying mixtures containing up to six VOCs at concentrations as low as 0.25% (400-fold dilution). Additionally, regression analysis effectively predicted VOC concentrations at levels as low as 1% (100-fold dilution), with the best model achieving an R2 value exceeding 0.82. This approach demonstrates the potential for rapid and real-time food safety monitoring, effectively overcoming the limitations of traditional methods such as culture-based or qPCR techniques, while its ability to reliably classify complex VOC mixtures makes it a valuable tool for on-site food safety assessments and quality control applications across various industries.