Random forest models accurately classify synthetic opioids using high-dimensionality mass spectrometry datasets
Abstract
Detection of novel threat agents presents several challenges, a principle one being the development of untargeted methods to screen an increasing number of threat chemicals whose exact structures are unknown. With the use of Machine Learning (ML) tools, we can guide the development of analytical methods for broad-spectrum detection of unbounded threat chemical families in complex mixtures. Toward this goal, we used nominal mass and high-resolution mass spectrometry data for hundreds of synthetic opioids and non-opioid compounds. We tested two ML techniques, logistic regression and random forest, to develop models towards a practical, implementable method for opioid detection. We found that of these tested ML methods, random forest models resulted in the highest validation accuracy (95+%) for both nominal mass and high-resolution classification of opioids versus non-opioids, with low false positive and false negative rates. The RF models were then used to successfully predict the classification of 10 compounds—five opioids and five non-opioids not part of the training and validation analysis. This application of ML is a critical step towards the development of field-deployable nominal mass spectrometers with ML-driven analyses for classification of emergent threats.

Please wait while we load your content...