End-to-end molecular structure elucidation from multimodal NMR spectra images using vision transformers

Abstract

Nuclear magnetic resonance (NMR) spectroscopy is a cornerstone technique for molecular structure elucidation, but interpreting complex NMR spectra remains challenging and often relies on expert-driven, heuristic workflows. Recent deep learning approaches have enabled spectrum-to-structure prediction, yet many depend on peak-level text annotations that discard informative intensity patterns and are not readily extended to higher-dimensional NMR experiments. Here, we present NMRViT, a spectral Vision Transformer framework for NMR-driven molecular structure elucidation that operates directly on raw spectral signals of both one-dimensional (1H, 13C) and two-dimensional (HSQC) spectra. Trained on a large-scale simulated dataset, NMRViT achieves strong performance across single-modality and multimodal NMR spectral input configurations, and provides, to our knowledge, one of the first end-to-end benchmarks for HSQC-based structure prediction. We systematically evaluate both zero-shot transfer and fine-tuning from large-scale simulated data on two experimental benchmarks covering 1D and 2D NMR spectra, highlighting the simulation–experiment domain gap and showing that fine-tuning with only a small number of experimental samples can substantially reduce this gap. To further improve candidate ranking, we introduce a chemical-shift-based post-processing strategy that re-ranks candidate structures using local spectral evidence, yielding consistent performance gains in both zero-shot and fine-tuning settings. Together, these results establish raw-spectrum vision transformers, combined with lightweight experimental adaptation and chemically informed re-ranking, as a practical framework for automated molecular structure elucidation from multimodal NMR data.

Graphical abstract: End-to-end molecular structure elucidation from multimodal NMR spectra images using vision transformers

Supplementary files

Article information

Article type
Edge Article
Submitted
23 Mar 2026
Accepted
19 May 2026
First published
26 May 2026
This article is Open Access

All publication charges for this article have been paid for by the Royal Society of Chemistry
Creative Commons BY license

Chem. Sci., 2026, Advance Article

End-to-end molecular structure elucidation from multimodal NMR spectra images using vision transformers

C. Han, X. Pan and Y. Zhang, Chem. Sci., 2026, Advance Article , DOI: 10.1039/D6SC02352E

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements