Two-Stage Semi-Supervised Machine Learning for Classification of Ti-Rich Nanoparticles and Microparticles Measured by spICP-TOFMS
Abstract
Single-particle inductively coupled plasma time-of-flight mass spectrometry (spICP-TOFMS) can be used to measure metal-containing nanoparticles (NPs) and sub-micron particles (µPs) at environmentally relevant concentrations. Multielement fingerprints measured by spICP-TOFMS can also be used to differentiate natural and anthropogenic particle types. Thus, the approach offers a promising route to classify, quantify, and track anthropogenic NPs and µPs in natural systems. However, biases in spICP-TOFMS data caused by analytical sensitivities, Poisson detection statistics, and elemental variability at the single-particle level complicate particle-type classification. To overcome the inherent bias in spICP-TOFMS data for the classification of particle types, we have developed a multi-stage semi-supervised machine learning (SSML) strategy that identifies and subsequently trains on systematic noise in spICP-TOFMS data to produce more robust particle-type classifications. Here, we apply our two-stage SSML model to classify individual Ti-containing NPs and µPs via spICP-TOFMS analysis. To build our model, we measure neat suspensions of anthropogenic TiO2 particles (E171) and natural titanium-containing particle types: rutile, ilmenite, and biotite by spICP-TOFMS. Element mass amounts recorded per particle are used to classify particle type by SSML and then systematic particle misclassifications are identified and recorded as uncertainty classes. Following, a second SSML model is trained with the addition of uncertain particle-type categories. With two-stage SSML, we demonstrate low false-positive rates (≤ 5%) and moderate particle recoveries (50-90%) for all anthropogenic and natural particle types. Two-stage SSML is a streamlined, hands-off method to identify and overcome bias in spICP-TOFMS training data that provides a robust particle-type classification.
- This article is part of the themed collection: Fast Transient Signals – Getting the most out of Multidimensional Data