Deep Set Model for the Automated NMR Fingerprinting of Unknown Mixtures
Abstract
Elucidating unknown mixtures is a critical challenge in chemistry and chemical engineering. Nuclear magnetic resonance (NMR) spectroscopy is a powerful analytical technique generally suited for this purpose. However, component-wise elucidation with NMR is tedious for complex mixtures, requires expert knowledge, and often yields ambiguos results. In contrast, identifying and quantifying structural groups in a mixture from NMR spectra is much more straightforward. In prior work, we have introduced 'NMR fingerprinting' for the automated elucidation of structural groups in unknown mixtures based on standard NMR experiments and a support vector classification (SVC) from machine learning (ML). In the present work, we present a substantially advanced NMR fingerprinting method that employs a deep set model (DSM), addressing major shortcomings of the SVC, and integrates additional information from 2D NMR experiments. The DSM was trained on experimental NMR spectra of pure components from open-source databases, augmented with synthetic spectral data, and comprises invariant and equivariant network structures to ensure predictions independent of the input order of the NMR signals. Tested on experimental pure-component test data, the DSM performs excellently, significantly outperforming our previous approaches. Furthermore, we demonstrate the applicability of the DSM to unknown mixtures by predicting the structural groups from NMR spectra of test mixtures measured using a benchtop NMR spectrometer. The predictions agree very well with the true mixture compositions, highlighting the method's potential for efficient automated mixture analysis and providing a reliable basis for downstream tasks, such as thermodynamic modeling using group-contribution methods.
Please wait while we load your content...