Raman spectroscopy and machine learning for forensic document examination
Abstract
Forensics relies on the differentiation and classification of document papers, particularly in cases involving document forgery and fraud. In this study, document papers are classified by integrating Raman spectroscopy with machine learning models, namely, random forest (RF), support vector machines (SVMs), and feed-forward neural networks (FNNs). Among the machine learning models, the RF model effectively calculated the feature importance and identified the critical spectral region contributing to classification, enhancing the transparency and interpretability of the result. Spectral preprocessing with the first derivative significantly improved the classification performance. The spectral range 200–1650 cm−1 was identified as a highly informative region for differentiation, reducing the number of input variables from 756 to 360 while enhancing the model accuracy. The FNN model outperformed the RF and SVM models, with an F1 score of 0.968. The results underscore the potential of combining Raman spectroscopy with machine learning for forensic document examination, offering an interpretable, computationally efficient, and robust approach for paper classification.