Machine learning analysis to classify nanoparticles from noisy spICP-TOFMS data†
Abstract
Single-particle inductively coupled plasma time-of-flight mass spectrometry (spICP-TOFMS) is a promising method for the quantification and classification of anthropogenic and natural nanoparticle (NP) types based on measured multi-elemental compositions of individual particles. However, spICP-TOFMS data shows systematic bias in the detected elemental compositions of particles as a function of particle size, composition, and analytical sensitivity. To overcome the inherent bias of spICP-TOFMS data for the classification of NP types, we report a multi-stage semi-supervised machine learning (SSML) strategy. In our approach, systematic particle misclassifications are first found and then these “noise classes” are incorporated into the SSML model for the development of a second, more robust classification model. As a case study, we use cerium(IV) oxide, ferrocerium mischmetal, and bastnaesite mineral NPs as representatives for engineered (ENP), incidental (INP), and natural (NNP) nanoparticle types, and classify particles in mixed samples based on our final SSML model. This two-stage SSML model has a receiver operating characteristic area under the curve (ROC AUC) value of 0.979, and false-positive rates of 0.030, 0.001 and 0 for ENPs, INPs and NNPs, respectively. These low false-positive rates allow for accurate particle-type classification of mixed samples with variable number concentrations; here, we demonstrate particle-type quantification across more than two orders of magnitude. Overall, our two-stage SSML model for NP classification identifies and overcomes bias in spICP-TOFMS training data to provide a simple and robust approach for incorporation of machine learning models in spICP-TOFMS particle classification strategies.
- This article is part of the themed collections: Fast Transient Signals – Getting the most out of Multidimensional Data and JAAS HOT Articles 2023