Open Access Article
Daphne R. Patten
,
Raven L. Buckman Johnson
,
Trevor T. Forsman†
and
Young Jin Lee
*
Department of Chemistry, Iowa State University, Ames, Iowa 50011, USA. E-mail: yjlee@iastate.edu
First published on 30th March 2026
Fingerprints are a widely recognized form of forensic evidence, valued for their ability to link individuals with specific locations. Traditional fingerprint analysis relies on optical imaging to identify a match in a fingerprint database; however, where no match is found, the evidential value of a latent print is limited. Here, we present the first study to integrate matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS) with supervised machine learning to infer physical activity from fingerprint chemistry, expanding the utility of fingerprints beyond identification alone. Physical activity labels were derived from a validated questionnaire and converted into binary classes. Supervised machine learning algorithms were trained on the lipid features and evaluated against the survey-derived labels. The top-performing models were an ensemble algorithm based on multiple decision trees and a neural network, which classified physical activity with accuracies of 75 ± 8% and 73 ± 7%, respectively. These results demonstrate that fingerprint lipid chemistry encodes biologically meaningful information related to physical activity and establish a new approach for extracting lifestyle and behavioral indicators from trace evidence, with potential applications in forensic investigations and noninvasive fingerprint-based assessments in medicine.
Mass spectrometry (MS), one of the most discriminating analytical techniques in forensic science, has played a central role in this shift.3 In particular, matrix-assisted laser desorption/ionization MS (MALDI-MS) enables high-throughput, extraction-free analysis by rastering a laser across the fingerprint surface to generate desorption/ionization events that ionize the chemical compounds present.4–8 As a soft-ionization technique, MALDI-MS can detect a broad range of endogenous and exogenous chemicals in fingerprints, resulting in a complex mixture that undergoes physical and chemical changes after deposition, complicating data interpretations; thus, many have turned to advanced computational strategies to aid in data analysis.2,9
As fingerprint chemistry has come into focus, studies have increasingly examined endogenous compounds to infer subtle biological variations. Machine learning (ML) is well-suited to this task because it identifies multivariate patterns in complex datasets and builds predictive models from empirical data.10,11 A key subset of ML is supervised ML, in which models are trained on labeled data to improve the prediction accuracy.12 Several studies have demonstrated the potential of supervised ML to extract individual characteristics from fingerprint chemistry. For example, Ferguson et al. analyzed peptides and proteins in fingerprints from 80 people, using MALDI-MS and partial least squares discriminant analysis to determine sex with an accuracy of 85%.13 This work was later expanded to include 199 participants while still achieving 86% accuracy using a supervised ensemble model.14 Using the same fingerprint dataset, a parallel study by Bury et al. investigated whether a person's age could be distinguished; however, an ensemble model achieved only 66% accuracy on the same sample set.15 While insightful, peptide/protein analytes represent a small fraction of fingerprint residue.9,16,17 In contrast, lipids are more abundant and generally more stable in fingerprints, offering an analytically and biologically robust substrate for ML-based chemical profiling.9,16,17
One forensic characteristic that remains underexplored is physical activity (PA), which may offer insights into individuals' occupations or lifestyles. Numerous studies on lipid metabolism have demonstrated that even individuals who meet only the minimal recommended activity thresholds exhibit measurable alterations in serum lipid profiles. Therefore, we hypothesize that comparable lipid alterations may also be detectable in fingerprint lipids.18 Although criminal profilers may attempt to infer occupation type from crime scene evidence, such assessments are inherently subjective and vary in reliability.19,20 Empirically derived indicators of PA could strengthen behavioral profiling and assist investigations, particularly when other leads are limited.
Preliminary evidence by O'Neill et al. suggests that fingerprint lipids, particularly triacylglycerols (TGs), may reflect PA levels.21 However, their conclusions were based on a small cohort (n = 8) and a simple Boolean survey, both of which are susceptible to response bias. The present study expands upon that work by incorporating a larger participant group (n = 81) and using a validated PA assessment developed by Besson et al.22 This survey captures detailed information on the duration and frequency of participants' commuting, occupational activities, and other behaviors across 38 activity categories. Quantitative responses reduce the subjectivity common in qualitative surveys, enabling more reliable PA estimates without the need for invasive physiological measurements.22
In addition, this study examines wax esters (WEs), diacylglycerols (DGs), and TGs – the three major lipid classes present in sebaceous secretions and latent fingerprints – to evaluate their potential as biomarkers of physical activity. TGs and DGs play central roles in energy storage and metabolic regulation, with TGs being among the most extensively studied lipid classes in serum. In contrast, WEs are predominantly found in the epidermis; however, their biosynthesis depends on fatty acid availability, thereby indirectly linking all three lipid classes to lipid metabolism flux.23,24 Building on this foundation, the present study investigates PA as a forensic variable using a larger dataset, validated activity metrics, and chemically relevant lipid features. Supervised ML models are applied to MALDI-MS fingerprint data to classify lipid profiles into a binary PA category.
Physical activity energy expenditure for a given activity, i, (PAEEi) was calculated using eqn (1) developed by Besson et al.:
![]() | (1) |
In this equation, DRA refers to the duration (in hours per day) of the activity, and MET is the metabolic equivalent of the task or activity. MET values were obtained from a previously published compendium of MET intensities, with 1 MET defined as energy consumption with no physical activity (e.g., sleeping) equivalent to 3.5 mL of oxygen consumption per kg (body weight) per min.25,26 Weight refers to the individual's self-reported body weight in kilograms. The conversion factor 4.263 represents the energy yield of aerobic respiration in kilojoules and is based on the assumption that one liter of oxygen consumption produces 20.3 kJ of energy.22 RMROxford denotes the individual's resting metabolic rate, calculated by using the Oxford equations published by Henry,27 also adopted by Besson et al., and is adjusted by the daily duration ratio (DRAi/24). MET values for each activity and an example of PAEE calculation can be found in the supplementary spreadsheet labeled Tables S1 and S2, respectively.
To estimate an individual's overall PA, eqn (2) was used to calculate the total physical activity energy expenditure per body weight (PAEEw):
![]() | (2) |
The sum of all PAEE values for each reported activity is combined with the PAEE value for unaccounted time (PAEEu), which has a MET value of 1.2. An MET value of 1.2 corresponds to the energy expenditure of being awake but mostly sedentary, such as during screen time or other low-intensity activities.22 Including the PAEEu component helps mitigate the underestimation of PAEE due to reporting bias related to time spent on electronic devices. The resulting total PAEE is then normalized by the individual's self-reported body weight (in kilograms).22
This work includes some deviations from the methodology used by Besson et al. Most notably, while Besson et al. assumed a fixed sleep duration of eight hours per night for all participants, the questionnaire used in the present study was modified to record each individual's self-reported sleep duration. Removing this assumption allows for a more accurate estimation of each subject's PAEEu. Additionally, Besson et al. applied PAEEu retroactively only to individuals who reported bicycling as a mode of transportation, assuming that such individuals engage in higher overall PA. In contrast, given the high number of college students in this study's participant pool, PAEEu was applied more broadly to account for low-intensity activities such as walking between classes and extended smartphone use.
Fig. 1 depicts the spread of PAEEw scores from the 81 participants used in this study. The box represents the interquartile range (IQR) with the median at 42 kJ (d−1 kg−1), and the 1st and 3rd quartiles at 32 and 62 kJ (d−1 kg−1), respectively. Whiskers extend to 1.5 times the IQR, ranging from 20 to 108 kJ (d−1 kg−1), which suggests a wide range of PAEEw values among the participants. Some outliers beyond the whiskers correspond to participants with unusually high levels of physical activity. The median value of 42 kJ (d−1 kg−1) matched the approximate PAEEw of an average adult with a sedentary occupation and thirty minutes of light walking.28–30 Therefore, 42 kJ (d−1 kg−1) was defined as the threshold for binary PA classification; those with a PAEEw < 42 kJ (d−1 kg−1) were labeled as inactive (n = 39), and those with PAEEw > 42 kJ (d−1 kg−1) were labeled as active (n = 42).
000 (at m/z 200) for a mass range of m/z 380–1100. An imaging raster step of 75–100 µm was used, and approximately 1000 spectra were collected and averaged. In accordance with the Iowa State University Internal Review Board guidelines, only a small portion of the fingerprints were analyzed to limit the total amount of identifiable information obtained by the instrument that can be imaged using MALDI-MS imaging.31
Mass spectra were internally calibrated to ensure a mass error of less than 3 ppm, allowing for the confident use of the Python script. Mass calibration was performed using two endogenous, abundant fingerprint lipids, TG 48:2 and WE 36:2, in a two-point calibration, following prior lipid assignments in a high-resolution mass spectrometry study.34 In total, 35 TGs, 88 DGs, and 88 WEs were used, alongside the biological sex and age of the participants, for a total of 213 possible features for ML. To ensure maximum data quality, a signal-to-noise ratio cutoff of three (S/N > 3) was implemented. A base-10 logarithmic transformation was then applied to lipid ratios. In preliminary ML, the log transformation yielded superior cross-validated performance for PA classification compared with alternative transformations in the CL app (data not shown). This aligns with prior work showing that log transformation is an effective pretreatment for MS-based metabolomics because it stabilizes variance, mitigates heteroscedasticity, and reduces the dominance of very high-intensity features—benefits that can translate into improved downstream statistical and ML performance.35,36 Averages were calculated from replicate fingerprints of the same donors. Due to the limited number of individuals with restricted diets, samples from individuals who reported restricted diets were removed due to the potential to interfere with lipids in fingerprints.21 Lastly, fingerprints from ten individuals were removed from the study due to the low quality of mass spectra attributed to high exogenous contamination and/or low lipid signals across all lipid analytes, bringing the final number of fingerprints from 106 to 81 participants.
![]() | ||
| Fig. 3 Example of a Fingerprint MALDI-Mass Spectrum. SQ: squalene, WE: wax ester, DG: diacylglycerol, TG: triacylglycerol. | ||
Exercise is widely recognized as essential to human health, and many studies have examined its physiological effects, including exercise-induced fluctuations in lipid metabolism. Given the known influence of PA on metabolism and lipid regulation, several lipid classes were selected as candidate biomarkers of PA in fingerprints: TGs, DGs, and WEs. TG levels in serum have been shown to vary with individual's exercise habits.21,40,41 DGs, produced during the hydrolysis of stored TGs, may similarly serve as indicators of PA.23 WEs – synthesized exclusively on human skin and functioning as protective hydrophobic barriers – were also evaluated based on the hypothesis that their secretion levels may shift with exercise.24,42
Regular exercise has additionally been shown to modulate androgen levels, altering the production of various metabolites.43 These androgen fluctuations can influence sebum lipid composition, although the underlying mechanisms remain incompletely understood.44,45 Other endogenous lipids of interest include fatty acids detected in the low-mass region (below m/z 400). This region, however, is frequently contaminated by exogenous compounds such as fingerprint enhancement powders, limiting its reliability.7 The squalene (SQ) base peak at m/z 433.3805 was not used in this study due to its high susceptibility to oxidation, both on the skin and after deposition, which limits its analytical utility.39,46,47
Previous work by O'Neill et al. demonstrated statistically significant differences in TG levels between active and inactive men.21 Based on this, three feature lists were generated: TG only (37 features, including sex and age); TGs and WEs (125 features); and TGs, DGs, and WEs (213 features). Table S4 summarizes the performance of ten ML models in the CL app using these feature lists with a 10-fold cross-validation. All accuracies fell below 0.7 – well under acceptable scientific standards and unsuitable for forensic casework. One likely explanation for the poor performance is overfitting or modeling degradation due to irrelevant or noisy features. To improve model performance and exclude noninformative variables, Shapley feature selection was applied to retain only the top 15 features regardless of the lipid list used for training.50,51 Four algorithms achieving accuracy higher than 0.6 in the CL app were used for further optimization: SVM, kNN, EN, and NN. A limit of 15 features ensured a favorable feature-to-sample ratio, consistent with practice in high-dimensional fields such as genomics and metabolomics, which commonly face small sample sizes and the curse of dimensionality.52 In cases where fewer than 15 features had nonzero mean absolute Shapley values, only those features were retained.
Once optimized, the hyperparameters and selected features for each algorithm were incorporated into MATLAB for independent testing (Fig. 2, bottom right). Each model was retrained using a 70/30 train-test split to evaluate performance on unseen data, stimulating real-world forensic deployment. Fig. 4 presents boxplots comparing the four ML algorithms across the three feature lists with Shapley-reduced features. The best performance was achieved by the EN model trained on all three lipid classes (TGs, WEs, and DGs) plus participants' age, yielding an average accuracy of 75 ± 8% and ROC AUC of 0.84 ± 0.08, indicating strong discrimination between active and inactive groups. To our knowledge, this is the first study to classify PA from fingerprint lipid chemistry using supervised ML, limiting direct comparisons. Nevertheless, the magnitude of our metrics is consistent with prior MALDI-MS studies on different endpoints, such as Bury et al. at 66% for age predictions15 and Heaton et al. at 85% for predicting sex,14 both using peptide/protein analytes. Hence, this ML performance suggests the presence of biologically significant trends between fingerprint lipids and PA. This study has several limitations, as further discussed in the Limitations section, but there is a fundamental limitation in the binary reduction of a continuous phenotype such as PA. Heaton et al. noted similar limitations when reducing age from continuous to discrete age categories in their study.14 More discussion of the limitations can be found in the Limitations section. Table S6 lists the statistical values corresponding to the same models shown in Fig. 4.
The EN model's strong performance is partially attributed to the inclusion of participant age as a predictor, as previous studies have shown that biological age influences lipid production.2,16 Age may therefore correlate with the lipid profiles studied, although elucidating this biological relationship is beyond the scope of the present work. Despite its analytical usefulness, age is not ideal in forensic contexts because biological age cannot yet reliably inferred from fingerprints, as demonstrated by Bury et al.53 Accordingly, a second EN model was created that excluded age. As shown in Fig. S3, removal of age reduced the EN accuracy to 67 ± 8%, falling below the 0.7 threshold.
Among the remaining algorithms, the NN trained on TGs + WEs performed best without age, achieving an average accuracy of 73 ± 7% and ROC AUC of 0.79 ± 0.06 – comparable to EN with age. Thus, the NN algorithm currently represents the more appropriate option for forensic applications aimed at determining PA. Future improvements in biological age estimation from fingerprints may enhance EN performance and broaden its suitability. Understanding the behavior of each ML algorithm provides additional context for their performance. The EN algorithm achieved the highest metrics. The optimized EN model used an adaptive boosting approach, in which multiple decision trees (DTs) are sequentially built, each emphasizing corrections to the misclassification of the previous tree.54 This iterative learning strategy allows EN models to handle complex multivariate data such as fingerprint lipids. Although DTs performed similarly in the CL app, they were excluded due to their susceptibility to overfitting and reduced generalizability. ENs mitigate these limitations by aggregating multiple weak learners.12,54,55
The NN algorithm produced the second-highest metrics and was the best-performing model when age was excluded. NNs are well-suited to high-dimensional datasets because they can capture nonlinear relationships and interdependencies among features. However, they generally require larger datasets to avoid overfitting and demand substantial computational resources.12,55 Expanding the participant cohort would likely improve NN performance.
SVMs also handle redundant or interdependent features effectively, relying on finding the optimal separating hyperplanes to classify the data.12,55,56 However, datasets with only subtle biological variation, such as those used presented here, often result in poorly defined class boundaries, reducing SVM accuracy. Because all participants belong to the same species and exhibit overlapping PA levels the SVM likely could not establish adequate create margins.
Finally, kNN is an intuitive and transparent algorithm that classifies new data based on the most common class among the nearby training data.11,12,55 Its reliance on the full feature set makes it highly sensitive to irrelevant or noise variables, and interdependent features can further degrade performance, consistent with the results observed in this study.
Fig. 5 represents the confusion matrices for the best-performing EN and NN models. Correct predictions are shown in blue and misclassifications in orange. Both models show similar classification patterns. In forensic contexts, the false positive rate – false positives divided by (false positives + true negatives) – is of particular concern.48 Misclassifying inactive individuals as active could misdirect investigations and waste valuable time and resources. The EN model exhibited false positive rate is 22 ± 14% for the active class and 27 ± 17% for the inactive class. The NN yielded 32 ± 6% and 22 ± 14% for the same classes. While these values reflect encouraging trends, they remain insufficient for operational use. To confirm that observed patterns reflect genuine biological variation rather than ML artifacts, a permutation test was performed (Fig. S4).57 Activity labels were randomly reassigned five times and processed using the same ML pipeline. As expected, the ML accuracies dropped to 0.52 ± 0.10 (EN) and 0.51 ± 0.10 (NN), confirming that our model accuracy of 0.75 and 0.73 with EN and NN, respectively, is a meaningful outcome, and the classification patterns in Fig. 5 are not due to an ML artifact or random chance.
While the present study is not yet suitable for operational deployment, it reveals a biological trend in fingerprint lipids in response to physical activity. To advance this empirically informed approach toward practical forensic application, external validation using independent cohorts is essential to evaluate robustness, reproducibility, and model generalizability. Future studies should therefore include larger and demographically diverse populations, incorporate expanded physical activity targets (e.g., multiclass classification or regression frameworks), and assess performance across varying environmental, physiological, and lifestyle conditions. In this study, we used groomed fingerprints to enhance data quality but this must be also tested with natural fingerprints in future studies. Such efforts will be critical to determine whether the observed lipid patterns can be reliably translated into a broadly applicable forensic tool.
These findings underscore the growing potential of ML to advance trace evidence analysis by extracting biologically relevant information from complex chemical data. Future work will prioritize expanding the sample size, refining classification schemes, and examining additional factors such as diet. Ultimately, the goal is to develop new tools for forensic investigations, including methods for extracting relevant information from the chemical composition of unknown latent fingerprints. With continued investment in both data collection and algorithm development, this study highlights a promising future for the integration of ML in forensic science.
Footnote |
| † Present address: Evotec, Princeton, New Jersey 08540, USA. |
| This journal is © The Royal Society of Chemistry 2026 |