Machine learning prediction and calibration of cellulose-based solid-phase extraction performance for pharmaceuticals across aqueous matrices
Abstract
Cellulose-based solid-phase extraction has been increasingly proposed for concentrating trace pharmaceuticals from complex waters; however, cross-laboratory transfer remains uncertain because studies vary in matrix chemistry, sorbent functionalization, extraction format, elution strategy, and quality control. Evidence from 2015 to 2025 was gathered, and 637 experiments from 36 reports and 28 DOIs were modelled using 29 descriptors of method and matrix. ElasticNet (EN), XGBoost (XGB), and random forest regressor (RFR) were evaluated using study group nested cross-validation with conformal prediction to estimate out-of-study performance and 90% confidence intervals for recovery, matrix recovery ratio (MRR), enrichment factor (EF), limit of detection (LOD), and limit of quantification (LOQ). ElasticNet dominated the sensitivity endpoints, achieving a mean R2 of 0.99999 for the enrichment factor, 0.99985 for the limit of detection, and 0.99914 for the limit of quantification, with mean 90% interval widths of 0.300, 44.386, and 829.752, respectively. For the recovery and matrix recovery ratio, random forest has the strongest correlation but remained weakly predictive, with top settings yielding a mean R2 of about −0.52 and MAE of about 15.53 for the recovery and a mean R2 of about −1.03 and MAE of about 21.39 for the matrix recovery ratio, with 90% confidence intervals of 0.651, most pronounced for wastewater and river matrices. Decision maps were used to translate these contrasts into operating guidance and reporting priorities for matrix descriptors needed to support defensible local validation and method transfer.

Please wait while we load your content...