Enhancing automated drug substance impurity structure elucidation from tandem mass spectra through transfer learning and domain knowledge

Abstract

Liquid chromatography-tandem mass spectrometry (LC-MS/MS) is an essential analytical technique in the pharmaceutical industry, used particularly for elucidating the structure of unknown impurities in the synthesis of active pharmaceutical ingredients. However, the interpretation of mass spectra is challenging and time-consuming, requiring significant expertise. While recent computational tools aimed at automating this process have been developed, their accuracy in determining the chemical structure limits its use in practice. In this paper, we introduce a new method called SEISMiQ for elucidating unknown impurities from their MS/MS spectra. We are able to significantly improve elucidation accuracy by integrating domain experts' knowledge, specifically the impurity sum formula and known substructure, into the model's training and inference process. Further performance improvements can be achieved through transfer learning using simulated MS/MS spectra of impurities from an in-house database. Finally, the need for any experimental data collection for finetuning can be circumvented by simulating the entire drug substance synthesis process in silico via reaction templates.

Graphical abstract: Enhancing automated drug substance impurity structure elucidation from tandem mass spectra through transfer learning and domain knowledge

Supplementary files

Article information

Article type
Paper
Submitted
21 Mar 2025
Accepted
17 Jul 2025
First published
24 Jul 2025
This article is Open Access
Creative Commons BY license

Digital Discovery, 2025, Advance Article

Enhancing automated drug substance impurity structure elucidation from tandem mass spectra through transfer learning and domain knowledge

E. Dorigatti, J. Groß, J. Kühlborn, R. Möckel, F. Maier and J. Keupp, Digital Discovery, 2025, Advance Article , DOI: 10.1039/D5DD00115C

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements