PyFasma: an open-source, modular Python package for preprocessing and multivariate analysis of Raman spectroscopy data
Abstract
Raman spectroscopy is a versatile, label-free technique for probing molecular composition in biological samples. However, the detection of subtle biochemical traits in high-throughput spectral datasets requires careful preprocessing, dimensionality reduction, and statistically sound analytical strategies. We present PyFasma, an open-source Python package for Raman spectroscopy, integrating essential preprocessing tools (e.g., spike removal, smoothing, baseline correction, normalization), multivariate techniques (PCA, PLS-DA), and spectral deconvolution within a modular, Jupyter Notebook-friendly framework. In addition to describing the software, we demonstrate PyFasma's capabilities through a practical biomedical case study comparing Raman spectra from healthy and osteoporotic cortical bone samples. The results revealed statistically significant differences in mineral-to-matrix ratio and crystallinity between assigned groups, with PCA and PLS-DA successfully distinguishing pathological from normal bone spectra. PyFasma encourages best practices in model validation, including the powerful but often overlooked, repeated stratified cross-validation, enhancing the generalizability of multivariate analyses. It also offers an easy-to-use, extensible solution for Raman data analysis, enabling the reproducible and robust interpretation of complex spectra of biological samples.
- This article is part of the themed collection: SPEC 2024: International Conference on Clinical Spectroscopy