Automatic identification of compounds in molecular mixtures from liquid-phase infrared spectra
Abstract
Interpreting spectroscopy data is a critical bottleneck in automating chemical research and industrial characterization. Particularly within infrared (IR) spectroscopy, identifying compounds in complex, liquid-phase chemical mixtures largely relies on expert knowledge, as variable peak assignment, broadening, and shifts hinder data-driven methods. Here, we show that an algorithmic approach can identify components in both simulated and experimental mixture spectra with high accuracy despite nonlinearities in liquid-phase IR data. The method is comprehensively benchmarked with a dataset of over 44 000 simulated liquid-phase IR spectra for mixtures and achieves up to 90% accuracy in identifying molecular components across a dataset of binary and ternary liquid mixtures. Our strategy is robust to perturbation of spectra, and its accuracy is capped by near-identical liquid-phase IR spectra that limit the resolution of chemical identification, imposing theoretical limits on achieving perfect accuracy in structure identification. Finally, we apply the method to automatically interpret IR spectra in experimental settings, correctly identifying the components of nearly all samples within a blind study. This work provides tools and data to advance automated chemical laboratories through algorithmic interpretation of liquid-phase IR spectra of mixtures.

Please wait while we load your content...